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UNDERGRADUATE STATISTICAL EDUCATION* 


S. S. WiLkKs 
Princeton University 


S WE ENTER the second half of the twentieth century one of the most 
A serious long-range problems confronting the world of statistics is 
that of establishing a sound pattern of statistical education for college 
undergraduates. The uses of statistics and statistical methods have 
become so widespread throughout business, industry, government, and 
scientific research in recent years that this matter concerns not only 
the professional statisticians themselves, but many thousands of per- 
sons in these fields who are finding it necessary to become intelligent 
consumers of statistics and statistical methods. Our present difficulties 
are deep-rooted, but in the time which Association tradition has as- 
signed me for a retiring presidential address, I wish to discuss some of 
the major aspects of this situation and to propose some steps for dealing 
with it. 


ADVANCED STATISTICAL TRAINING WELL UNDER WAY 


I shall not be concerned with specialized graduate training in statis- 
tics. It was a critical problem until fairly recently, but it is now being 
gradually solved, thanks to the emergence of a few strong centers for 
advanced training in applied and theoretical statistics and to the entry 
of increasing numbers of able students into these fields. The sustained 
influence of the various research journals in applied and theoretical 
statistics over a period of years, and the effect of discussion of the 
teaching of advanced statistics during the last ten years under the 
leadership of Committees of the Institute of Mathematical Statistics, 
the National Research Council, the Royal Statistical Society, and the 
Inter-American Statistical Institute, have been important factors in 
the establishment of these centers. Advanced training is now well 





* Presidential address at the 110th Annual Meeting of the American Statistical Association, Chi- 
cago, December 28, 1950. 
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under way and will steadily improve. It is carried on not only in uni- 
versity centers but in certain government agencies and laboratories. 
We already have a substantial flow of excellent books from the pub- 
lishers for students at the advanced level and this will have a telling 
effect in a few years. So let us proceed to the problem of undergraduate 
statistical education. 


GROWTH OF STATISTICS AND PRESENT TRAINING 


Any person who will retrace the path by which statistics has reached 
the posit’on it now occupies in government, business, industry, and 
scientific research will be deeply impressed by the acceleration with 
which this development has occurred and by the extent to which it now 
penetrates most compartments of these broad fields. He will be shocked 
but not surprised by the topsy-like educational structure which has 
grown up in response to the training needs of this movement. 

The great acceleration is highlighted by such facts as these: the 
membership of this Association has grown from 1700 (in 1935) to 4500; 
the American Society for Quality Control, founded in 1945, now has 
over 3500 members; the two-year old Biometric Society now has over 
900 members; the Institute of Mathematical Statistics, founded in 
1935, now has more than 1200 members; the Econometric Society and 
the Psychometric Society, both founded during the past 20 years have 
members which number in the hundreds. And there are at least a dozen 
other fairly new organizations which have strong interests in the use of 
statistics and statistical methods in various special fields. The extent 
of penetration is well indicated by the titles of the more than 100 
papers presented at this 110th Annual Meeting of the Association. 
Any doubts that remain will be dissipated by looking through current 
government reports, by talking to men in business and industry about 
current problems, and by browsing through recent issues of research 
journals in astronomy, biology, chemistry, education, economics, engi- 
neering, geology, medicine, physics, psychology, or sociology. 

This rapid and far flung growth in the use of statistics and statistical 
methods has caught thousands of people with little or no training in 
statistics. They did not receive any when they were in high school or 
college. It is not surprising that we find so many people with so many 
different interests trying to pick up some training in the subject any 
way they can. Some attend evening classes; some attend short courses 
and clinics; others simply read books. In our colleges and universities, 
we find instruction in elementary statistics being given by teachers in 
many departments and schools. Many of these teachers have recog- 
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nized, only in recent years, the significance of the rise of statistics for 
their own special fields of interest. They have taken the initiative of 
learning some statistics which they can use in their own research and 
can pass on to their students. These teachers have pioneered in helping 
to satisfy immediate instructional needs—needs which would still be 
unfulfilled if they had not stepped forward. 

These efforts to meet an immediate requirement for instruction in 
statistics are highly commendable and are still urgently needed. No 
one seeing the magnitude of the statistical movement and of the short- 
age of personnel trained in statistics, can argue objectively that the 
job as a whole could have been done better some other way. But, even 
so, it is a fact, which we must recognize, that our statistical instruction 
has been ad hoc training for immediate purposes and not education 
adequate for our long range needs. As a whole and as it now stands, our 
statistical education structure is something like a group of temporary 
barracks which have been built in a hurry, built in piece-meal fashion 
with very little design and built on poor foundations. If the need and 
value of statistics have any permanence in our society then there is no 
question but that this temporary structure must sooner or later give 
way to some kind of permanent building. In other words, we must see 
to it that college students who are likely to have to know something 
about statistics do not go through college in the future without learning 
the elements of the subject—at least the elements! 


STATISTICS IN MODERN SOCIETY 


The validity of the urgent need of a better educational set-up for 
statistics must be based on the premise that statistics is fundamentally 
important and in fact indispensable to modern society. It is already 
clear that this premise is well-founded. But it is easy to give it further 
support by taking a little closer look at the role statistics now plays in 
government, business, industry, and scientific research. 

The basic purpose of statistics in government has been, and must 
continue to be, that of providing a systematic and effectively main- 
tained body of information about social and economic activities and 
conditions of the city, county, state, or nation. This role, as far as federal 
government is concerned, has been excellently formulated by Frederick 
C. Mills and Clarence D. Long in their report on the “Statistical Agen- 
cies of the Federal Government” which was prepared for the Hoover 
Commission in 1948. To use their words: 


... full account must be taken of the needs of a modern society for ac- 
curate current information concerning the processes of economic and social 
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life. As the division of labor in economic affairs becomes more complex, as 
economic and social systems come to involve diverse combinations of self- 
adjusting and consciously controlled operations, as governmental structure 
itself becomes more highly developed, the need for current statistics is in- 
tensified. The supervision of interstate commerce and the regulation of the 
banking system, public utilities, and the exchanges must be guided by 
knowledge of a wide range of relevant facts. The formulation of policy and 
legislation relating to conservation, crime suppression, public health and 
social security, the provision of housing and education, the settlement of 
industrial disputes, the maintenance of economic opportunities and to other 
aspects of national life must be based upon accurate, current information. 
If the competitive enterprise system is to function effectively private de- 
cisions concerning investment, production, and distribution must proceed 
from a knowledge of market conditions much broader than that available 
to individuals guided only by their own observations. In war, even more 
than in peace, the facts of manpower and its distribution, of natural re- 
sources and national wealth, of the capacity and quality of industrial equip- 
ment, must be available promptly and in detail if mobilization and use of 
resources are to be effective. 

Recent statistical developments have been marked not merely by an 
increase in the number of reports and in the extent of their use. The char- 
acter of the contribution made by statistics to Government and to private 
administration has been profoundly modified within the last quarter century. 
Enumeration for purposes of historical study persists; the record of national 
life is contained in measurements of changes in population, production, 
wealth and other historical series. But, in high degree the emphasis in the 
work of the statistician has shifted from this backward-looking process to current 
affairs and to proposed future operations and their consequences. Experiments 
are designed, samples selected, statistics collected and analyzed with reference to 
decisions that must be made, controls that must be exercised, judgments that 
entail action. The growth of the statistical services over the last several 
decades reflects this change in the function of statistics. 


The italics are mine, but the statement in italics has fundamental 
significance for statistics in the future. The implication is clear. The 
statistical process is becoming more and more widely used as a scientific 
method to be utilized as a basis for developing policy and as a basis for 
making decisions for administrative purposes, not only in government 
but in private administration. 

As for business and industry the uses of statistics and statistical 
analysis are rapidly mounting. The statistical quality control move- 
ment, due largely to W. A. Shewhart, shows how fast statistical meth- 
ods will spread within a segment of industry, when properly developed 
and introduced. There are other segments of the overall industrial 
process such as research and development, product testing, and market 
research, in which statistical methods are well on their way, others 
where the use of these methods has only just started, and still others 
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virtually untouched. As business and industry become more complex 
executives will depend more and more on scientific statistical methods 
for collecting, analyzing, and interpreting information and for decision- 
making. These methods will play an important role in “operations re- 
search,” an activity now confined largely to the scientific study of 
military operations, but which may eventually play an important role 
in the scientific analysis of business and industrial operations in con- 
nection with executive decision-making. Highly competent personnel 
will be needed in all of this work, and many of these persons will come 
from our business schools. 

The increasingly important part that statistics is playing on the 
frontier of nearly every scientific field is unmistakable if we are to judge 
by the growing occurrence of statistical analysis in research articles 
appearing in all kinds of scientific research journals. If these methods 
are not useful and effective in scientific investigations then a lot of 
scientists are fooling themselves and their hard-to-fool colleagues! 

Now, let us turn from the technical role of statistics in our modern 
society to its significance in the general education of the intelligent 
citizen who graduates from college or even from high school. This 
citizen lives in a world of facts and figures. He makes decisions all of 
the time on the basis of large or small amounts of information. He 
carries on in a mass production economy. He is bombarded with ad- 
vertising claims by every device of mass communication. He covers his 
risks by insurance. He occasionally plays poker, canasta or bridge, and 
sometimes gambles a little. His children take intelligence tests and their 
scores are reported to him. At present the only tools he has for critical 
evaluation and decision in all of these matters, are experience and 
common sense, and these often fail him. Would anyone deny that this 
citizen would be able to carry on a little more intelligently in his com- 
plicated twentieth century environment if he had received a few of the 
elementary concepts of probability, statistics and logic at about the 
same time that he was exposed to plane geometry and trigonometry in 
high school? Perhaps H. G. Wells was right when he said “statistical 
thinking will one day be as necessary for efficient citizenship as the 
ability to read and write”! 

If this fundamental importance of statistics in our modern society is 
granted—and I do not see how any reasonable person can deny it— 
then we must concern ourselves with the problem of building a sound 
pattern of statistical education in our colleges and with introducing 
some of the important concepts into our high schools. It would be im- 
possible in the space available here to go into a detailed discussion of 
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the steps which might be taken eventually to ensure a sound scheme of 
statistical education in colleges and high schools and all of the implica- 
tions of such steps. I can only hope to open up the subject by broadly 
outlining a few major steps which I believe should be considered and to 
hope that these proposals will provoke further thought and discussion! 
and some action. 


THE CENTRAL PROBLEM OF UNDERGRADUATE STATISTICAL 
EDUCATION AND A PROPOSED SOLUTION 


The basic difficulty with our statistical education at present can be 
very simply stated—it does not extend far enough down into our edu- 
cational scale. This is, in part, due to the fact that the subject is rela- 
tively a newcomer into our cultural pattern and educational system. 
Its values have been discovered largely outside of academic walls 
while the elements of the subject have been seeping slowly into the 
colleges and universities through the top, by way of research, graduate 
courses and upper class courses. The elements of statistics have become 
almost frozen into the curriculum at these upper levels. 

It is gradually becoming recognized that there is a body of elemen- 
tary concepts and basic skills in probability, statistics, logic, and ex- 
perimental philosophy, together with a certain amount of prerequisite 
mathematics which constitutes the core of the scientific method which 
pervades modern experimentation and scientific investigation what- 
ever the field may be. I shall simply call this body the “elements of 
statistical analysis and inference” and will not attempt to make a de- 
tailed outline of the specific topics, concepts, skills, processes and pro- 
cedures which would be covered. Others may wish to call it “quantita- 
tive methods.” Even after careful selection of material, this body is 
still bulky. Effective presentation would be required in order to boil it 
down into a sequence of two full-year courses. 

The eventual acceptance of this body of knowledge as fundamental 
in the early stages of training of students in the biological and social 
sciences will be particularly significant. For this will mean the emer- 
gence of a plan by which students of these sciences will receive dis- 
ciplined training in certain principles of scientific method basic to those 
sciences early enough to be useful to them in later courses and thesis 
work. Such a program would play a role in the biological and social 





1 Some aspects of the problem of statistical education in the United States were discussed in two 
reports published in 1947. “Personnel and Training Problems Created by the Recent Growth of Applied 
Statistics in the United States” was issued in May, 1947 by the Committee on Applied Mathematical 
Statistics of the National Research Council. “The Teaching of Statistics” was issued in September, 
1947, by the Committee on Teaching of the Institute of Mathematical Statistics. 
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sciences similar to that now played by mathematics through calculus 
for the physical sciences and engineering. At the present time parts and 
bits of this material are scattered through upper class and graduate 
courses and given to the students too late. Under the proposal being 
made here this material would be systematically organized and supple- 
mented and would be presented early in college. 

For such a plan to be successfui the body of concepts and skills to 
which we have referred will have to be moved down into freshman and 
sophomore courses, and students planning to go into the biological and 
social sciences will have to be required to take them as freshmen and 
sophomores. This would give them the foundation for the scientific 
method they will need in later courses and thesis work, just as students 
of the physical sciences and engineering “now receive-a foundation in 
classical mathematics through calculus in their freshman and sopho- 
more years. I do not wish to imply here that physical science and 
engineering students do not need probability and statistics. They need 
much more than they are now receiving. Probability theory is funda- 
mental in such fields as quantum mechanics, nuclear theory and diffu- 
sion processes; modern statistical methods are now commonly used in 
the analysis of experimental results obtained in physical and chemical 
laboratories; statistical quality control is now a standard procedure in 
thousands of industrial mass-production processes. 

The principles of statistical analysis and inference have filtered here 
and there into sophomore courses and in a few bold instances into 
freshman courses. But this filtering must be accelerated—accelerated 
until these elements are taught as required courses for social and 
biological sciences in the freshman and sophomore years. In spite of 
the compelling logic of this situation there is some dissenting opinion. 
Much of this contrary opinion bases its case on the assumption that 
freshmen or sophomores are too immature to understand or appreciate 
properly the fundamental concepts and basic skills of statistics. There 
are those who feel that it is necessary for the student of elemen- 
tary statistical analysis to spend hours of computation on almost life- 
sized research problems and who then argue that such procedures do 
not mean much to freshmen and sophomores. It is true that such com- 
putation means little to freshmen and sophomores. But the first con- 
tention is faulty educational philosophy. By the same type of argument 
the aerodynamicist should learn all of the basic concepts and skills 
which he will need in designing a new type of wing by designing a wing, 
or the structural engineer should learn all the basic concepts and skills 
required for designing a new type of bridge by designing a bridge. 





8 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1951 


Actually, each of these men learns the needed concepts and skills sys- 
tematically and step-by-step all along the way up through high school 
and college. There are others who argue that a student often does not 
know as a freshman or sophomore whether he will major in one of the 
social sciences, one of the biological sciences, business administration, 
etc., and how is he expected to know that he should study the elements 
of statistical analysis and inference at this early level? This argument 
only strengthens the case for concerted effort by departments in the 
biological and social sciences to see that early courses on these elements 
are established and required of students who enter those departments. 
It would soon become clear to students and faculty advisors that these 
courses must be taken early. After all, a similar principle operates for 
students in the physical sciences and engineering, with reference to two 
years of mathematics. 

What we need in statistics are elementary courses at elementary levels 
in which the student can concentrate on fundamental concepts and basic 
skills in a graduated manner, doing just enough problems and laboratory 
exercises to fix these ideas without losing himself in the meaningless 
manipulation of formulas. If these elements are presented clearly and 
systematically to a student early in his college career he will be in a 
position to use them with facility and understanding in later courses, 
in thesis work, and in life-sized problems. If properly organized this 
basic material can be presented eventually in a sequence of two full- 
year courses, just as the basic mathematics for students in the physical 
sciences and engineering is now usually presented in two full-year 
courses. As a practical matter it may be desirable to develop these 
courses in two stages: the first stage consisting of organizing material 
for the first course and trying it out for several years, and the second 
stage to be similarly devoted to the second course after the first course 
is running satisfactorily. 


SOME ORGANIZATIONAL ASPECTS OF THE TEACHING 
OF ELEMENTARY STATISTICS 


The elements of statistical analysis and inference are fundamentally 
the same when properly taught, whether in biology, economics, mathe- 
matics, psychology, or sociology. When the teaching of this material is 
reduced to its essentials it will become obvious to all, including college 
and university administrators, that such courses are general college 
courses. They will ask themselves whether such courses should be given 
by a half-dozen departments or sponsored by some single department 
or committee. The answer to this question will differ from one college 
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to another. In some cases departments of statistics will be created. In 
others, departments of mathematics will handle the courses. In others, 
special committees will sponsor the courses. It has been argued by some 
that the development and teaching of such courses should ultimately 
be made the responsibility of the departments of mathematics. One of 
the real difficulties with this latter possibility is the lack of understand- 
ing—at least on the part of most of those departments of mathematics 
which lack at least one member thoroughly trained in probability and 
statistics—of what is really needed in such courses. In general, depart- 
ments of mathematics are so thoroughly bound by tradition to the 
classical mathematical preparation of students for the physical sciences 
and engineering that they find it difficult to imagine that students in 
the biological and social sciences need any training in quantitative 
methods at all, or if they do think these students need such training, 
they do not see why it should be any different from the classical mathe- 
matical training of students in the physical sciences or engineering. A 
few departments of mathematics have not even introduced a modern 
course in probability and statistics for students in mathematics and the 
physical sciences although the job of meeting probability and statistics 
needs of such students is fairly simple since they are normally equipped 
with at least two years of college mathematics. But, thanks to the 
steady influence of the Institute of Mathematical Statistics and a few 
strong centers in mathematical statistics, the number of departments 
of mathematics in this situation is rapidly vanishing. However, even 
fewer departments of mathematics have attacked the problem of sta- 
tistics for undergraduate engineering students. This, too, could be done 
quite simply by introducing a good course in statistical quality control 
and in industrial experimentation. 

If the elementary material in statistical analysis and inference is 
organized into general college courses and brought to the freshman 
and sophomore levels, the upper class courses now covering elementary 
statistics in each department or field could be converted into courses 
which deal directly with the quantitative problems of that field— 
courses in which elementary statistical methods could be used in stride 
and which would serve better the needs of the department or field. We 
would then approach a situation similar to that which exists in the 
_ physical sciences and engineering with respect to mathematics through 
calculus. The teachers of upper class courses in these subjects are not 
hampered by the job of having to stop and teach elementary mathe- 
matics to their students. They make immediate and uninhibited appli- 
cation, to the problems at hand, of basic mathematical concepts and 
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skills, learned by the students as freshmen or sophomores. We must 
look forward to the time when teachers in the biological and social 
sciences will be able to make similar use of the elements of statistical 
analysis and inference. 


INDICATIONS FAVORABLE TO AN EARLY SOLUTION 


We have seen how elementary statistics has been introduced into 
our colleges and universities through the top, into graduate and upper 
class courses, and that there has been a tendency for it to become lodged 
at this upper level. But there are some definite indications of forces at 
work which may, in the near future, bring this instruction down to the 
freshman and sophomore levels where it belongs. 

First let us take a look at the social sciences. Elbridge Sibley of the 
Social Science Research Council made a very significant study in 1947 
of “The Recruitment, Selection and Training of Social Scientists” 
which throws much light on what graduate students in social sciences 
think about their own undergraduate training. Questionnaires were 
filled out and returned by 581 social science graduate students (out 
of 1080 approached) and 575 natural science graduate students (out of 
930 approached). One of the items of outstanding significance in this 
report is the fact that when asked what they regarded as the most 
serious deficiencies in their own undergraduate training, that men- 
tioned with the greatest frequency by the social science students was 
training of a mathematical and statistical character. Similar results 
were found by C. C. Brigham in a study made in 1940 for the Com- 
mittee on Research Training of the Social Science Research Council 
which covered responses from 196 individuals out of the 330 engaged 
in research in different social sciences who were nominated by their 
respective professors in 20 leading universities as superior candidates 
who received Ph.D. degrees during the period 1925-1935. By way of a 
mail questionnaire these individuals were asked to state what limita- 
tions in their training they now felt. Of the 196 responding to the 
questionnaire 120 of them stated that they wish they had had more 
mathematical or statistical training or both. The testimony of social 
science students themselves as revealed by these two studies speaks 
louder than any words that can be set forth here of the desirability of 
moving on toward adequate training in quantitative methods at 
the undergraduate level for social science students. 

Recent action has been taken in connection with the problem of 
mathematical training in the social sciences. A meeting sponsored by 
the Econometric Society, the Institute of Mathematical Statistics, and 
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the Mathematical Association of America discussed this subject in 
1948. Participants from economics, sociology, psychology, mathematics 
and statistics discussed various aspects of the problem. One of the 
results of this meeting was the adoption of a resolution that there be 
established a committee to make a thorough study of the problem of 
introducing adequate mathematics and statistics into the training of 
social scientists. Such a committee has been set up under the chairman- 
ship of W. G. Madow and it has representatives from most of the social 
science organizations. This committee has an opportunity to render a 
unique service in helping to solve the problem of adequate training in 
mathematics and statistics for social science students. 

The problem of effective mathematical and statistical education for 
students in the biological sciences appears to be similar to that which 
exists in the social sciences. It would be illuminating to see the results 
of a study for the biological sciences similar to that made by Sibley for 
the social sciences, and specifically to know what graduate students in 
the biological sciences think about their own mathematical and sta- 
tistical preparation. The Section on Training and the Biometric Section 
of the Association are in an excellent position to conduct a cooperative 
study of this problem and to take action which would be indicated by 
the results of such a study. 

The most encouraging indication that forces are at work in the bio- 
logical sciences to direct more attention to statistical education is the 
rapid growth of the Biometric Section of this Association and more 
recently of the Biometric Society. But, these forces have been operating 
mainly at research and advanced levels. Sooner or later they must 
focus some attention on the problem of teaching the elements of sta- 
tistical analysis and inference at the freshman and sophomore levels. 

With the expanding use of statistical methods in business both for 
internal control of operations within individual companies, and for 
market research, business forecasting, and other studies relating to 
outside conditions, it is extremely important that statistical training of 
business students be of the highest quality. Donald R. Belcher, now 
Treasurer of the American Telephone and Telegraph Company, made 
the following excellent statement in 1924 about the role of the statis- 
tician in business and the need for improving the quality of his training: 

I submit then that the job of the statistician is to observe carefully 
and correctly, to treat his data honestly and dispassionately, to reason ob- 
jectively from a set of conditions to their inevitable consequences; that 
nowhere else in his academic experience can he find so excellent a field for 


development as in the mathematical and physical sciences. This fact will, 
some day, I hope and believe, receive adequate recognition on the part of 
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those men who are responsible for statistical training in the colleges. When 
the time arrives, we shall see statistical training characterized by a scientific 
dignity commensurate with the needs of the modern business world. 


Although this statement was made more than 25 years ago, I wonder 
how much closer we are now to the goal Mr. Belcher had in mind? It 
would be very useful to know how graduates of business schools now 
employed in business feel about the statistical training they have re- 
ceived. The new Section on Economic and Business Statistics and the 
Section on Training could render a great service both to business and 
statistical education by jointly making a study of the adequacy of the 
statistical training of business students and taking appropriate action 
on the basis of such a study. 


STATISTICAL EDUCATION IN SECONDARY SCHOOLS 


If the elements of statistical analysis and inference are effectively 
organized into elementary courses and moved down into freshman and 
sophomore courses in our colleges and universities, the question will 
certainly arise as to what preparation in high school the student needs 
for these college courses. There is no doubt but that the study of 
mathematics in high school is an essential prerequisite for the study of 
statistical analysis and inference in college. The basic question is 
whether the mathematics curriculum in high school as now constituted 
is fully effective. The subject matter covered in algebra, plane geome- 
try, trigonometry and solid geometry is classical. It has been developed 
over a long period of time as a part of a tradition for preparing students 
for the physica] sciences and engineering in college and for developing 
“clear, concise, and logical thinking” in the intelligent citizen. There 
have been slight variations in order and manner of presentation of these 
subjects from one high school text book to another, but there has been 
no radical change in the subject matter during the time the concepts of 
probability and statistics have assumed such widespread importance 
in our society. This, of course, is to be expected since these concepts 
have not even become properly incorporated into our college curricula. 
- But, looking forward to the time when some of the more elementary 
ideas of probability and statistics may be introduced into the high 
school curriculum we should consider at least some of the initial steps 
which might be taken. In my opinion the most obvious step to be con- 
sidered is that of eliminating some of the fossilized subject matter from 
high school algebra, trigonometry, and particularly solid geometry and 
replacing it by subject matter from elementary probability, statistics 
and logic. In this day and age why should high school students spend 
a lot of time manipulating trigonometric identities, applying Horner’s 
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method to location of roots of polynomials, and proving some of the 
theorems near the end of plane and solid geometry? The answer seems 
to be this: many high school teachers teach these subjects because they 
believe it is useful for students to know them when they get to college; 
the college teachers want them taught because they provide added 
practice in algebraic and other mathematical operations in which col- 
lege students should have some proficiency. A thorough-going study of 
the high school mathematics curriculum should be made by a joint com- 
mittee of high school and college mathematics teachers in consultation 
with persons in the biological, physical and social sciences to determine 
to what extent some of these archaic topics can be supplanted by topics 
in elementary probability, statistics, logic, etc. Such a committee would 
almost certainly find it possible to develop a solution along these lines 
which would provide the high school student with a background of 
concepts, skills and information more useful to him in his role as a 
college student or as a citizen than the one he now obtains. A back- 
ground which would, at the same time, provide equal opportunities for 
acquiring proficiency in algebraic and other mathematical skills needed 
for pre-engineering and pre-physical science mathematics courses. Such 
a project would require a great deal of initiative, imagination, and 
boldness, but, if done well, it would bring back some of the life which 
has vanished from high school mathematics, it would help rekindle 
mathematical interest among students with mathematical aptitude, 
and would certainly not diminish the qualities which high school 
mathematics is supposed to possess for developing “clear, concise, and 
logical thinking” in the average citizen. 

If, after thorough exploration of the possibility of eliminating some 
of the obsolete topics now taught in high school mathematics in favor 
of topics in probability, statistics, logic and other modern mathematical 
subjects, it does not appear feasible to overcome the inertia of tradition, 
then, of course, some other solution would have to be sought. One 
possibility might be to develop a course in this group of topics as has 
been done in physics and chemistry and use it as an alternative to the 
typical fourth year of high school mathematics, namely, solid geometry 
and trigonometry. Such a course could be supplemented by carefully 
developed laboratory exercises covering simple examples chosen from 
various fields. 


THE SHORTAGE OF TEACHERS 


At the present time a serious limiting factor in achieving better 
undergraduate statistical education is the shortage of highly qualified 
teachers. Progress in developing such persons is very slow, and many 
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of those trained to the Ph.D. level are going into colleges and uni- 
versities to carry on at research and upper class teaching levels, thereby 
doing little to directly remedy the present difficulties at lower under- 
graduate levels. Others are going into government and some into indus- 
trial and business organizations. These people are all desperately 
needed where they are going. Many of them will play key roles in the 
ultimate solution of our statistical education problem. But more effort 
must be made to develop, possibly in new ways, a larger group of 
teachers up to the stage where they can and will effectively teach 
courses in the elements of statistical analysis and inference in freshman 
and sophomore courses. This group should also include M.A. and B.A. 
students who go into high school teaching. 


THE PUBLIC RELATIONS OF STATISTICS 


I would now like to say a few words about the problem of public 
understanding of statistics. If a pattern of undergraduate statistical 
education is eventually built up along the lines I have indicated, the 
problem of public understanding of statistics and its role in modern 
society will be simpler than it is now. At least there will be many more 
people who will know something about the subject and its concepts. 
But, at the present time, and in spite of the phenomenal growth of 
statistics and its role in business, industry, government and scientific 
research the fact is that we are still living pretty much in Mark Twain’s 
age of “lies, damned lies, and statistics” as far as the general public is 
concerned. With a few exceptions, statistics has made a poor impression 
in the eyes of the public. One of the most notable exceptions is the 
manner in which the statistical quality control movement has de- 
veloped its public relations. This has been achieved through sustained 
effort based on a sound statistical development. 

There are many other sound statistical developments which are in 
great need of being presented effectively to major segments of the 
public by means of popular books and articles in magazines and news- 
papers. One of these is survey sampling and its many applications, 
another is design of experiments, a third is personnel selection by 
modern testing which is basically statistical. Many others could be 
mentioned. 

Effort on the public relations of statistics at the present time de- 
serves high priority. The Association has a new Public Relations Com- 
mittee under the Chairmanship of A. N. Watson, which has done excel- 
lent work during the year and especially in connection with this meet- 
ing. This Committee deserves the full support of the Association and 
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its members in building better public relations for statistics. Much 
initiative must, of course, rest with individual members of the Associa- 
tion to prepare and publish popular or semi-popular articles and expo- 
sitions. 


THE ASSOCIATION AND STATISTICAL EDUCATION 


The American Statistical Association is the oldest and largest sta- 
tistical society in the United States. It has wielded strong influence 
throughout its long history in the improvement of statistics, especially 
in government, and in the struggle toward a profession of statistics. 
Yet it is a conspicuous fact that in all of its 110 years of existence the 
Association has only sporadically concerned itself with the problem of 
improving statistical training and education—a problem which is 
absolutely vital in establishing a profession of statistics. Until very 
recently, it has had only one formal skirmish with the problem. In 
1925, and largely through the influence of R. E. Chaddock, the Presi- 
dent of the Association at that time, a Committee on Educational and 
Professional Standards for Statisticians consisting of five distinguished 
members of the Association was established under the chairmanship 
of J. W. Glover. The first task which this Committee took upon itself 
was to make a survey and prepare a report on the nature and extent of 
statistical instruction in American colleges and universities. One of the 
main results of this report was the discovery of very wide variation in 
the prerequisites and content of statistics courses from department to 
department within a college or a university. The Committee expressed 
deep concern over the virtual absence of any mathematical prerequi- 
sites for statistics courses at upper class levels in some departments, 
and concluded its report with the following statement: 

This divergence in point of view and practice will probably continue for a 
long time and the employer looking for trained men—not experts—must 
judge for himself from which departments he will select his assistants. Only 
time and experience will settle questions of this character and perhaps the 


results accomplished by the men who are trained along these different lines 
will eventually decide the matter. 


The Glover Committee also sponsored two sessions on the problem of 
instruction in statistics at the Annual Meeting of the Association in 
December, 1925. Papers were read by men from various fields including 
economics, sociology, business, public health, and biology. The general 
theme of these sessions was that the teaching of statistics was in a sad 
state and should be improved somehow or other. A great deal of con- 
sideration was given to the problem of training the expert in statistics. 
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Since 1925, occasional papers have been published on statistical 
education and a few sessions have been devoted to the subject at An- 
nual Meetings. A project for an intensive study of the status of sta- 
tistical training was considered in 1939 but was never carried through. 
No further Association action was taken with respect to statistical 
education until 1944 at which time a special committee was appointed 
by Helen Walker, who was President of the Association then, to con- 
sider the problem of statistical training. The work of this committee 
eventually led to the establishment of the Section on Training in 1947. 
This Section is only just getting under way. Its main efforts thus far 
have been devoted to the preparation of sessions for the Annual Meet- 
ing. But, with the pressures and opportunities now mounting for better 
statistical education at the lower undergraduate level the Section faces 
very serious responsibilities. These responsibilities extend far beyond 
the job of merely organizing sessions on statistical teaching and training 
clinics for the Annual Meeting. They include the development of out- 
lines of material and even texts for elementary courses, encouraging 
the preparation of articles on methods of teaching statistical concepts, 
consideration of problems of statistical education at the high school 
level, promoting “statistics for the citizen,” and many other activities. 

The Section on Training in cooperation with the Section on Eco- 
nomics and Business Statistics has the responsibility for making a thor- 
ough inquiry into the present status of specialized statistical training 
for students of economics and business beyond the general courses 
which have been proposed for freshmen and sophomores and making 
recommendations for improvements. Similar opportunities and obliga- 
tions exist for joint action between the Section on Training and each of 
the following: the Biometrics Section, the new Committee on Statisti- 
cal Methods in the Physical Sciences, the new Committee on Statistics 
in the Social Sciences, and the Committee on Public Relations. 

The challenge before all of the statistical societies for the improve- 
ment of statistical education is great. At the advanced level progress 
is gradually being made in the various special fields like biometrics, 
psychometrics, econometrics, and mathematical statistics under the 
stimulation of the Biometric, Psychometric, and Econometric Societies 
and the Institute of Mathematical Statistics. At the undergraduate 
level, we find no leadership anywhere; there is a state of general confu- 
sion. The time is ripe for some vigorous leadership. The Association 
through its Section on Training is the only organization which can 
reasonably assume the leadership at this time. In taking the inititive, 
the Section will gain the support not only of other Sections and Com- 
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mittees of the Association, but of other statistical organizations, and of 
leaders in the biological and social sciences who realize the need for 
better statistical education. It must meet the challenge. 


SUMMARY- 


In summary, I would like to emphasize in the strongest terms that 
one of the most serious problems that faces all of us who are directly or 
indirectly concerned with statistics as we enter the second half of the 
twentieth century is that of developing an adequate program for under- 
graduate statistical education. On an effective solution of this problem 
depends our ability to meet the training requirements which will be 
demanded by the role which statistics and statistical method will ulti- 
mately play in government, business, industry, and all types of scien- 
tific research. Graduate and research training in both applied and 
theoretical statistics is now making satisfactory progress, thanks to the 
influence of the Biometric, Econometric, and Psychometric societies 
and the Institute of Mathematical Statistics and to the existence of a 
few strong advanced training centers in these various fields. 

The basic fault with our present statistical instruction is that it does 
not extend far enough down into our educational scale. Much of it is 
hurried ad hoc training given mechanically and without adequate 
foundation and given too late to be useful to students while still in 
college. We have been and are still frantically devising all sorts of 
quick training courses to meet immediate needs of those who have gone 
through college without receiving any training in statistics. The situa- 
tion is most critical for students going into social and biological sciences. 
We must now look to the job of correcting these faults in our under- 
graduate statistical education. 

The essence of the solution of the problem lies in (a) developing a 
sequence of two full-year courses which will consist of concepts and 
skills mainly from probability, statistics, logic and experimental phi- 
losophy, together with some prerequisite mathematics, (b) placing 
these courses in the freshman and sophomore years, and (c) requiring 
students expecting to go into the social and biological sciences (includ- 
ing students of business) to take these courses as freshmen and sopho- 
mores. . 

The proposed solution is based on the now widely supported assump- 
tion that students in these fields need basic training in scientific 
method appropriate to their fields analogous to that now received by 
engineering and physical science students in mathematics through 
calculus. 
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Taking these courses as freshmen and sophomores would permit 
students in these sciences an opportunity to begin building a founda- 
tion in scientific method for their fields at a stage early enough to 
enable them to use it in later college work. 

When this step has been taken it will be clear to all that the essential 
concepts and skills now commonly taught in beginning courses in sta- 
tistics at upper class levels in various departments in our colleges and 
universities are basically the same in all courses. This will necessitate 
some kind of a coordinating committee or departmental responsibility 
in sponsoring such courses. 

The solution of the problem of providing effective undergraduate 
statistical training will raise the question of changes in the high school 
curriculum to provide better high school preparation for these college 
courses. The simplest solution to this question would be the replace- 
ment of certain obsolete topics now taught in high school algebra, 
plane and solid geometry and trigonometry by some topics in the ele- 
ments of probability, statistics, and logic. Such changes, when care- 
fully made, would not only add significantly to the interest and value 
of high school mathematics for the average citizen, but would also pro- 
vide ample opportunities for practice in algebraic and other mathe- 
matical operations in which proficiency is required for college mathe- 
matics. 

Various factors have been discussed in this paper which indicate that 
the time is now ripe for some vigorous leadership in solving the problem 
of undergraduate statistical education. The only society which can rea- 
sonably assume the responsibility for this leadership is the American 
Statistical Association through its Section on Training, acting in co- 
operation with the other Association Sections, the Committee on Sta- 
tistical Methods in the Physical Sciences, the Committee on Statistics 
in the Social Sciences, the Public Relations Committee, and other sta- 
tistical societies. It wil! receive the cooperation of many leaders in 
the biological and social sciences who realize the need for better 
statistical education and are convinced that action is needed. The 
challenge is great and it must be met. 





THE INFLUENCE OF STATISTICAL METHODS FOR 
RESEARCH WORKERS ON THE DEVELOPMENT 
OF THE SCIENCE OF STATISTICS 


F. YATES 
Rothamsted Experimental Station 


T Is now twenty-five years since R. A. Fisher’s Statistical Methods 
for Research Workers was first published. These twenty-five years 
have seen a complete revolution in the statistical methods employed in 
scientific research, a revolution which can be directly attributed to the 
ideas contained in this book, and which has spread in ever-widening 
circles until there is no field of statistics in which the influence of 
Fisherian ideas is not profoundly felt. 

Statistical Methods for Research Workers is a peculiarly personal pro- 
duction. It was written after five years work at Rothamsted, the largest 
and oldest of the British agricultural research stations, where Fisher 
had been appointed in 1919. At that time the idea of employing a statis- 
tician in such a field was a novel one. It was thought by the Director, 
Sir John Russell, that the accumulated results of the Rothamsted ex- 
periments would repay further examination by a mathematical statis- 
tician. In the event the appointment had much more far-reaching 
effects, as it resulted in the evolution of new statistical methods suitable 
for dealing with experimental material, and in the radical improvement 
of experimental design. 

Statistical Methods embodies the results of Fisher’s researches during 
his early years at Rothamsted. The methods put forward are largely 
those developed by the author himself to deal with the novel problems 
encountered as a result of his contacts with agricultural and biological 
research workers. They are based on the results of his own researches in 
mathematical statistics, the more important of which have recently 
been published in collected form [1]. The book is brief—the first edition 
contained only 239 pages of large type (350 words to the page), and 
the present edition (the 11th) contains only 354 pages of slightly 
smaller type. No mathematical proofs are included, and the discussion 
of the various subjects is by no means exhaustive. Apart from the 
addition of a chapter on estimation in the second edition, there have 
only been relatively minor additions to subsequent editions. 

To appreciate fully the achievement which the book represents, we 
must recall the statistical atmosphere of the time. It was the age of 
correlation and curve fitting. In Tables for Statisticians and Bio- 
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metricians, for example, first published in 1914, 37 per cent of the tabu- 
lar matter was concerned with curve fitting, and a further 18 per cent 
with various forms of correlation. The normal and Poisson distributions 
occupied 17 per cent, x? and “Student’s” z 5 per cent, the remaining 23 
per cent being devoted to tables of basic mathematical functions and 
miscellaneous statistical tables. 

It was also the age of coefficients of all kinds. In attempts to assess 
the degree of association in 2X2 contingency tables, for example, such 
measures as the coefficient of association, the coefficient of mean square 
contingency, the coefficient of tetrachoric correlation, equiprobable 
tetrachoric correlation, and the coefficient of colligation, were pro- 
posed. The way in which these coefficients were used revealed consider- 
able confusion between the problems of estimating the degree of associ- 
ation, and testing the significance of the existence of an association. In 
the field of regression and correlation we find, in addition to the ordi- 
nary partial regression and correlation coefficients, the multiple correla- 
tion coefficient, the correlation ratio, and Blakeman’s criterion. Even 
such a simple concept as the percentage standard deviation was termed 
the coefficient of variation.! 

Statistical Methods cut through this jungle, and broke fresh ground 
in a number of entirely distinct ways. It recognized, and emphasized, 
the difference between the problems of estimation and tests of signifi- 
cance. It set out methods for the exact treatment of sampling problems 
of the type that arise in the commonly required tests of significance 
and introduced a unity of approach into these problems which was 
previously lacking; apart from tests involving only the “classical” dis- 
tributions, the normal, binomial, and Poisson, the whole of the tests 
discussed are shown to be dependent on three fundamental distribu- 
tions, x’, t and z, of which the first two are special cases of the last. It 
showed how, by the use of exact methods, many of which are of im- 
portance even with quite large samples, the small samples that occur 
so frequently in experimental work, but somewhat rarely in observa- 
tional data, can be treated statistically. It recognized for the first time 
the importance of efficiency in estimation processes, and described a 
method (the method of maximum likelihood) for obtaining efficient 
statistics in practical cases. (This aspect was more fully developed in 
the second edition.) Finally, it laid the foundations of sound experi- 
mental design and analysis. 

In the following sections of this paper, I shall endeavor to describe 
in a little more detail those features of the book which were of particular 





1 Readers interested in the multiplicity of coefficients, etc., current at this time may consult 
Kendall [9], which contains a very full account of a great number of them. 
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novelty at the time, and which exerted most influence on the subse- 
quent development of statistical science. 


GENERAL PRINCIPLES 


The introductory chapter, which is chiefly concerned with concepts 
and basic principles, is a masterpiece of brevity and clarity. Nowhere, 
I think, is it possible even today to find such a lucid outline of the 
scope of statistical science, together with a record, in non-mathematical 
terms, of the ideas that Fisher had developed on estimation, maximum 
likelihood, tests of significance and the like. The concepts of consist- 
ency, efficiency, and sufficiency of estimates are introduced. The ne- 
cessity of using efficient statistics when testing for goodness of fit is 
stressed. Inverse probability is rejected. This chapter must indeed 
have had a profound influence in spreading the new concepts among 
those who were chiefly concerned with statistics as a practical tool in 
research enquiry. 


DISTRIBUTIONS 


The chapter on distributions sets a brisk pace which is maintained 
throughout the book. A lot of junk is cleared away. The lengthy dis- 
cussions of measures of the “central tendency” and of dispersion found 
in most of the statistical textbooks of the time are dispensed with—the 
median is not mentioned, and the probable error is dismissed with the 
characteristic Fisherian phrase: “The common use of the probable 
error is its only recommendation.” The use of the mean and the mean 
square estimate of the variance are justified by their sufficiency proper- 
ties for the normal curve, and the reader is introduced to the use of the 
normal probability integral, the fitting of a normal curve, grouping 
corrections and errors, the use of n—1 instead of n as divisor of the sum 
of the squares of the deviations, and tests of departure from normality 
by third and fourth moments, all in the space of 14 pages. In the next 
17 pages he is expected to become familiar with the Poisson and bi- 
nomial distributions, including tests of the variability of small samples 
from those distributions by means of the x? distribution. 


x? 


The chapter on the use of the x? tests of goodness of fit, indenendence 
and homogeneity likewise covers a great deal of ground, some of it 
simple and familiar to the statisticians of the time, and other parts 
which represented novel and difficult applications. Its most important 
immediate influence was to make available to the ordinary non-mathe- 
matical statistician a coherent account of the uses of x?, freed from the 
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confusion that had existed regarding the number of degrees of freedom 
which were appropriate in different cases. Although the point which 
had been under discussion for 10 years and had been very fully treated 
by Fisher in scientific papers (papers 5, 7 and 8 of [6]) and had, as far 
as 2X2 tables were concerned, been correctly treated (though by ap- 
proximate methods) by Yule in his Introduction to the theory of statistics 
[14] from its first publication in 1911, it was still the subject of con- 
troversy, and the reviewer of Statistical Methods in the British Medical 
Journal [2] in 1926 felt it necessary to write: 


The trained statistician interested in Mr. Fisher’s researches will miss 
a detailed justification of his conclusions. ... Even if the statement that 
Professor Pearson’s treatment of a fundamental problem contained a “seri- 
ous error” had not been disputable, and therefore improper in a work ad- 
dressed to elementary students, it would have reminded anyone of Ma- 
caulay’s remark on a similar occasion—‘“just so we have heard a baby, 
mounted on the shoulders of its father, cry out, ‘how much taller I am than 
Papa!’ ” 


Actually, the point was discussed by Fisher in a passage (Section 20) 
which for clarity of statement and convincingness of argument would 
be difficult to better: 


It was formerly believed that, in entering the x? table, n was always to 
be equated to one less than the number of frequency classes; this view led 
to many discrepancies, and has since been disproved with the establishment 
of the rule stated above. On the old view, any complication of the hypothesis 
such as that which in the above instance admitted differential viability, 
was bound to give an apparent improvement in the agreement between ob- 
servation and hypothesis. When the change in n is allowed for, this bias dis- 
appears, and if the value of P, rightly calculated, is many fold increased, as 
in this instance, the increase may safely be ascribed to an improvement in 
the hypothesis, and not to a mere increase of available constants. 


The ¢ distribution, which is dealt with in Chapter V, was less con- 
troversial, but many of its applications were of greater novelty than x’. 
The distribution was first deduced by Gosset (“Student”) [13] in 1908 
for the purpose of testing the significance of the mean of a small sample. 
Gosset himself, in his original paper, expresses very clearly the reasons 
why such a test was required in experimental work: 


There are other experiments, however, which cannot easily be repeated 
very often; in such cases it is sometimes necessary to judge of the certainty 
of the results from a very small sample, which itself affords the only indica- 
tion of the variability. Some chemical, many biological, and most agricul- 
tural and large-scale experiments belong to this class, which has hitherto 
been almost outside the range of statistical inquiry. 
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Again, although it is well known that the method of using the normal 
curve is only trustworthy when the sample is “large,” no one has yet told 
us very clearly where the limit between “large” and “small” samples is to be 
drawn. 

The aim of the present paper is to determine the point at which we may 
use the tables of the probability integral in judging of the significance of the 
mean of a series of experiments, and to furnish alternative tables for use 
when the number of experiments is too few. 


Nevertheless it is to Fisher, I think, that credit must be given for 
first recognizing the fundamental nature of the advance that thet test 
represented. Fisher also established with certainty the form of the dis- 
tribution (Gosset had obtained the correct form by approximate 
methods), and he replaced Gosset’s z by the more convenient t=2/n 
=estimate/estimated standard error of estimate. 

Gosset’s table (of z) was included in Tables for Statisticians and 
Biometricians, but it was only with the publication of Statistical Meth- 
ods that the wide applicability of the test, which covers the whole class 
of problems in which an estimate is tested by means of an estimate of 
its standard error based on a small number of degrees of freedom, was 
brought to the attention of research workers. 

Another feature of Chapter V was to exert a major influence for the 
better in the application of statistical methods. This is the fact that 
regression is considered, as it should be, in its own right, and not as an 
offshoot of correlation. Nothing had bedevilled the interpretation of 
statistical data involving a number of variates so much as the use of 
the correlation coefficient, and in particular partial correlation. In cases 
in which the influence of one or more variates on another is under 
consideration, regression analysis is almost always more appropriate— 
the very terms “dependent” and “independent” variates indicate this. 
Regression analysis provides coefficients and equations which are im- 
mediately interpretable in real physical terms and which are unaffected 
(except for precision) by the distribution of the values of the inde- 
pendent variates. The method is therefore immediately applicable to 
experimental situations in which the values of the independent variates 
are deliberately chosen by the investigator, and to comparative work 
in which the distribution of the independent variates differs from group 
to group of the data owing to natural causes. 


CORRELATION AND THE ANALYSIS OF VARIANCE 


The remaining three chapters of the first edition deal with inter- and 
intra-class correlation, the analysis of variance, and the design of ex- 
periments. It is here that the historical influence is most apparent. The 
chapter on the correlation coefficient originally opened with the sentence: 
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No quantity is more characteristic of modern statistical work than the 
correlation coefficient, and no method has been applied successfully to such 
various data as the method of correlation.? 


though even at this point the reader is warned that 


In experimental work proper its position is much less central; . . . it is 
seldom, with controlled experimental conditions, that it is desired to express 
our conclusion in the form of a correlation coefficient. 


Because of the importance that correlation analysis had assumed it 
was natural that the analysis of variance should be approached via 
correlation, but to those not trained in the school of correlation analysis 
(of which I am fortunate to be able to count myself one) this un- 
doubtedly makes this part of the book more difficult to comprehend, 
as is admitted by Fisher in the preface to the 9th edition. Of all the 
statistical methods of analysis that Fisher has introduced, the analysis 
of variance has probably had the most profound and far-reaching in- 
fluence. It is not, however, until the reader has mastered the subject 
of intra-class correlation that the possibility of an alternative approach, 
that of the analysis of variance, is revealed with the words (Section 40) 

A very great simplification is introduced into questions involving intra- 


class correlation when we recognise that in such cases the correlation merely 
measures the relative importance of two groups of factors causing variation. 


Fisher himself has stated that the analysis of variance is “merely a 
way of arranging the arithmetic.” This seems to me undue modesty. 
The concept of additive components of variance. and its concomitant, 
the possibility of expressing the values of a variate in terms of an addi- 
tive set of parameters with the parameter for a given classification 
having a fixed value for every member of each single class of this classifi- 
cation, marked a major break with tradition, and provided the essential 
link between least squares and regression analyses and the problems 
previously treated by intra-class correlation. It also directed attention 
to the features of the data that mattered, namely the differences be- 
tween the means of the different classes, instead of concentrating at- 
tention on the usually relatively unimportant aspect of the degree of 
similarity within classes. Furthermore it provided a method of eliminat- 
ing more than one source of variation (in those cases common in 
planned experiments, in which the data are what is now known as 
orthogonal), and also automatically, as it were, provided a pooled esti- 
mate of error by means of which the individual class means might be 
compared. 





2 Significant changes were made in the wcrding of this passage in subsequent editions. In the fourth 
edition “modern statistical” was replaced by “biometrical” and the word “successfully” was deleted 
In the fifth edition, “is” was changed to “has been.” 
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In addition to developing the analysis of variance procedure, the 
first edition of Statistical Methods made available for the first time the 
relevant exact test of significance by providing a table of the z distribu- 
tion for the 5 per cent level of significance. This was expanded and a 
1 per cent level added in subsequent editions. The analysis of variance 
would have been of very considerable value for many purposes even 
had the z test not been available—the cdmparison of the means of a 
specific pair of classes, for example, can be made by means of the ¢ test— 
but the provision of the exact test for variation between the means of 
all classes, or any group of them, introduced a logical completeness and 
exactitude into the whole structure of the methods described in Sta- 
tistical Methods and enabled the book to be written without any sub- 
stantial reference to approximate methods. 


THE DESIGN OF EXPERIMENTS 


The development of the analysis of variance opened the way to the 
whole of the modern technique of the design and analysis of experi- 
ments. The state of experimental design at that time, and the deduc- 
tions that it was considered reasonable to draw from the results, are 
well illustrated by an extract from the Rothamsted Report for 1918- 
1920. Under the heading of “The amount of fertilizers to use,” and after 
a discussion of the law of diminishing returns, and mention of the fact 
that on Broadbalk (the long-term wheat experiment) “the largest re- 
turn is given not by the first dressing but by the second,” it is stated: 


. .. & new experiment has been started to see if under ordinary conditions of 
farming the highest rate of profit is given by good rather than by small 
dressings of fertilizers. The results of the first year (1920) suggest that this 
may be so. 


INCREASE IN WHEAT CROP, 1920, FROM SPRING DRESSINGS 
OF SULPHATE OF AMMONIA AND SUPERPHOSPHATE 








Grain: Straw: 
Bushels per acre Cwts. per acre 





Date of application of | Feb. March May Feb. March May 
manure 10 6 10 10 6 10 











Single dressing Nil* 0.9 2.7 2.7 6.9 9.4 
3.7 2.7 


Double dressing 7.0 oo 21.7 _ 1 





* The correct value from the plot yields is —0.2. Presumably the alteration was made be- 
cause the presence of a negative value in the table would have made the results appear less trust- 
worthy to the average reader. 
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While the single dressing (100 lbs. sulphate of ammonia per acre) gave 
no appreciable increase in grain, and only a few cwts. of additional straw, 
the double dressing gave increases of no less than 7 bushels of grain and 
12 ecwts. of straw. Late application of the double dressing, however, was 
risky, giving an unhealthy straw liable to lodge and prone to disease. 


The experiment referred to consisted of 6 plots, one for each of the 
treatments shown in the table, together with a control receiving no 
nitrogen. The variation between the yields of grain of the plots receiv- 
ing nitrogen is equivalent to a standard error of 9% per plot, which we 
now know is about what would be expected from variations in fertility 
and other sources of experimental error. The results, far from demon- 
strating that the response to the double dressing is more than double 
that to the single dressing, are not inconsistent with the hypothesis that 
there is little additional response to the double dressing. 

The subject of experimental design is only dealt with very briefly in 
Statistical Methods. There are two sections at the end of the chapter on 
application of the analysis of variance, and an example on the analysis 
of the results of an agricultural field trial in the previous chapter. 
These passages, however, contain between them all the basic principles 
which govern modern experimental design. 

It is of interest to note that the principles of design are expounded 
after the methods of analysis had been illustrated in a somewhat com- 
plex example. This was in fact the historical order in which the subject 
was developed. It was by applying the methods of the analysis of 
variance to the results of experiments which did not conform to the 
principles of good design that their defects became apparent. 

Apart from the new method of analysis provided by the analysis of 
variance procedure, Fisher’s really novel contribution to experimental 
design was his insistence on the necessity of randomization, in order to 
ensure that the estimates of error and tests of significance should be 
fully valid. Any form of systematic arrangement casts doubt on the 
estimates of error and tests of significance. In cases such as agricultural 
field trials, in which the variation of the yields from plot to plot itself 
exhibits systematic features, it may wholly vitiate them. The adoption of 
randomization does not preclude the possibility of imposing restrictions 
such as, for example, arranging all the treatments of each replicate in a 
compact block with allocation at random within the block. At the same 
time the types of restriction which are in fact capable of giving an un- 
biased estimate of error are quite limited. Thus an arrangement in 
randomized blocks in which the positions of the treatments are bal- 
anced so as to eliminate the effect of a fertility gradient across the plots 
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is not capable of giving a fully valid estimate of error. On the other 
hand, Latin squares, which are double restricted arrangements in which 
each treatment occurs once only in each row and in each column, 
are capable of furnishing valid estimates of error provided that a 
random selection from all possible Latin squares of the given size is 
made. 

These are the essentials of modern experimental design. In the suc- 
ceeding years many detailed refinements have been introduced, both 
by Fisher himself and many others working in association with him. 
But it is on the foundations outlined above, which were expounded in 
the first edition of Statistical Methods, that all these refinements rest. 
And it is to the influence of Statistical Methods that much of the credit 
must be given for the rapid adoption of the new methods by practical 
agricultural and biological research workers. 

The illustrative example of Chapter VII is itself worth careful study. 
The experiment, which was carried out at Rothamsted in 1922, was one 
on 12 varieties of potatoes with three plots (patches) of each variety, 
each of these plots being split into three for two types of potash and a 
control without potash. 

The analysis of variance given in Statistical Methods is as follows: 








Degrees 


Variance due to 


of 
Freedom 


Sum of 
Squares 


Mean 
Square 





Between varieties 
Between patches for same variety 


11 
24 


43 .6384 
17.4401 





Within patches 
Potash dressing 
Sulphate v. chloride 
Differential response of varieties 
Differential response in patches 
with same variety 


.2911 
.0584 
2.1911 


8.0798 





Total within patches 


72 


10.6204 








Total 





107 





71.6989 


3.967 
727 


.2911 
.0584 
.0996 


1683 





It will be noted that no component corresponding to blocks was in- 
cluded in the analysis. Had the three replicates been considered as 
constituting blocks the whole-plot part of the analysis would have be- 
come: 
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Degrees of Sum of Mean 
freedom Squares Square 





Between replicates 2 10.1280 5.064 
Between varieties 11 43 .6384 3.967 
Remainder 22 7.3120 0.3324 


Total 35 61.0784 





Replicates are clearly significant, indicating some form of blocking on 
the ground (though such records as are available indicate that this 
blocking was somewhat imperfect). The whole-plot error is now re- 
duced from 0.727 to 0.332. One may hazard the guess that the latter is a 
better, though doubtless somewhat imperfect, estimate of error. The 
fact that this point was not discussed is an interesting indication of the 
tentative character of the early statistical analyses of experimental 
results. 

In other respects the analysis of this experiment exhibits a remark- 
able degree of development. Points to notice are: the method of dealing 
with split plots, working in terms of sub-plot units throughout; the 
introduction of an interaction term for the interaction between the 
different factors, varieties and potash; and the sub-division of the two 


degrees of freedom for potash into the average effect of potash and the 
difference between the two forms. ' 

Incidentally, the derivation of the sums of squares for this sub-di- 
vision provides an example of the way in which Fisher sometimes left 
points for his readers to worry out for themselves, considering, doubt- 
less, that it would stimulate thought. The relevant passage runs as 
follows: 


The sum of the squares of the three deviations, divided by 36, is .3495; 
of this the square of the difference of the totals for the two potash dressings, 
divided by 72, contributes .0584, while the square of the difference between 
their mean and the total for the basal dressing, divided by 54, gives the re- 
mainder, .2911. 


No clue is provided as to the derivation of the divisors 72 and 54, the 
second of which must undoubtedly have defeated many biologists and 
agriculturalists unversed in the formal algebraic theory of errors. 


SUBSEQUENT ADDITIONS 


As already mentioned the only real structural alteration to the book 
is the addition to the second edition of a chapter on estimation. This 
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chapter replaced and expanded the very hrief account of the method 
of maximum likelihood which is given as an Example in Chapter I in 
the first edition. 

This expansion increased somewhat the emphasis on the problems of 
estimation, but the chapter is more specialized and difficult than the 
rest of the book and it remained truce that the main emphasis of the 
more elementary parts of the book lay in the direction of tests of sig- 
nificance. I refer again to this point in the last section. 

Numerous other additions have been made from time to time, giving 
accounts of new developments which appeared to the author to be of 
interest or importance. These, however, for the most part fit closely 
into the original framework of the book, and may be regarded as ex- 
tensions of the structure already laid down, rather than as radical in- 
novations. Probably the only two additions that can really claim this 
distinction are those on the analysis of covariance (fourth and fifth 
editions) and discriminant functions (seventh edition). Other additions 
of particular interest in revealing the development of the subject are 
those on orthogonal polynomials (third and seventh editions), the exact 
test of significance of 2X2 contingency tables (fifth edition) and the 
extension of the ¢ test to give fiducial limits for the ratio of means and 
regression coefficients (tenth edition). 

The sections on experimental design and analysis, the branch of the 
subject which has probably shown the greatest growth, have (apart 
trom the addition on the analysis of covariance) been left almost with- 
out alteration. It is, I think, an interesting reflection on the historical 
sense of the author, and on his inclination to leave the field open to 
others to make their contributions, that although he was himself ac- 
tively working in the field at the time he thought it better to leave them 
wholly unaltered, until he felt the time was ripe for a completely new 
book on the subject. Certainly The Design of Experiments can be re- 
garded as a worthy offshoot of Statistical Methods. 

It is also interesting to note that the parts of the book dealing with 
correlation have remained almost without alteration, and now stand 
as a monument to a bygone age of statistics. 


RECEPTION OF THE BOOK 


As is only to be expected with a book that marks such a fundamental 
break with tradition, its full significance was not immediately recog- 
nized. Nevertheless the reviewers of the first edition did perceive that 
the book was an important one, and they confined their criticisms 
mainly to lack of due deference to authority and to questions on in- 
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telligibility and presentation. As might be expected, the absence of 
mathematical proofs was felt by many to be a defect, either because it 
made the book difficult to follow, or, more strangely, because their ab- 
sence would prevent the reader from verifying for himself the validity 
of the methods proposed. Even sixteen years later there still existed 
a desire for the mathematical “proof.” (See, for example, M. G. 
Kendall [8].) 

Fisher himself commented on this point in the prefaces to the later 
editions. Thus he states in the preface to the 9th edition: 

The practical application of general theorems is a different art from 
their establishment by mathematical proof. It requires fully as deep an 
understanding of their meaning, and is, moreover, useful to many to whom 
the other is unnecessary. 


That such understanding does not flow from the mathematical proof is 
sufficiently demonstrated by the number of advanced textbooks in 
mathematical statistics in existence today which establish the pro- 
cedure of the analysis of variance appropriate to replicated experiments 
and analogous material without reference to randomization. 

Apart from this the main defect of the early reviews was their as- 
sumption that the applications of small sample theory were solely con- 
fined to small samples, and their consequent implication that the book 
was of limited interest to the general statistician. Thus the review in 
Nature (Anon.) [1] states: “It treats of the interesting and important 
subject of small samples in statistical work.” That in Science Progress 
(“E.S8.P.”) [3] states: “The book is chiefly concerned with the best 
methods of handling small samples.” That in the Journal of the Royal 
Statistical Society (“L.I.”) [10], though it indicates that many of the 
methods were in fact not confined to small samples, concludes: “The 
book will undoubtedly prove of great value to research workers whose 
statistical series necessarily consist of small samples....” In the 
British Medical Journal (Anon.) [2] it is stated: “Since in the kind of 
biological research with which Mr. Fisher has had to deal practically 
small samples only are usually available, he has given more attention 
to the particular methods applicable to small samples than authors of 
most textbooks have deemed necessary.” Yule in his 8th edition (1927) 
refers to Statistical Methods as “a laboratory handbook rather than a 
textbook, [which] brings together in convenient form for the research 
worker the numerous methods developed, mainly by [the author], 
with special reference to small samples.” 

In actual fact many of the methods described in Statistical Methods 
are relevant to the treatment of large samples. The essential point is 
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that even when the data consist of observations on a large number of 
separate individuals these often require to be grouped according to a 
relatively small number of classes. Tests of significance invulving these 
classes frequently involve small sample theory. The examples included 
in the book are, indeed, fairly evenly distributed over what would be 
described as small and large samples, as is shown by the following table: 


Size of sample No. of examples 


8-20 8 
21-100 16 
100-500 6 
591-1000 4 
Over 1000 16 


The continued and growing demand for the book is best indicated by 
the numbers of copies printed for the various editions. These have been 
kindly supplied by the publishers, and are as follows: 


[Ist Edition 1925 1050 7th Edition 1938 2000 
2nd Edition 1928 1250 8th Edition 1941 2250 
8rd Edition 1930 1500 9th Edition 1944 2000 
4th Edition 1932 1500 10th Edition 1946 3000 
5th Edition 1934 1500 Reprinted 1948 1500 
6th Edition 1936 2000 llth Edition 1950 7500 


In all nearly 20,000 copies have been sold during the first 25 years 
of the book’s existence. The rate of sale during the latter half of this 
period has been very constant at about 1000 copies a year. The book 
has also been translated and published in French, Italian, and Spanish. 
It is being translated into German and into Japanese and publication 
should take place within one or two years. No figures for the distribu- 
tion of sales of the English editions over the different countries are 
available, but the publishers state that the early editions were sold 
mainly, if not entirely, in the United Kingdom, and that it would be 
reasonable to assume that at present practically half of each new edition 
is sold abroad to various countries, principally to the United States of 
America. 

Many requests (which have always been granted) have also been 
made for permission to reproduce in whole or in part tables and other 
matter first published in Statistical Methods. The basic tables have been 
reproduced almost without alteration in Statistical Tables for Biological, 
Agricultural and Medical Research, now in its third edition. 

These facts provide additional evidence, if any is needed, of the 
wide influence of the book. 
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PRESENT TRENDS 


In conclusion we may ask, in the light of present-day statistical 
teaching and practice, huw far the methods embodied in Statistical 
Methods have found acceptance, whether they have been rightly under- 
stood and applied, and whether in the light of experience of their use 
extension or modification is required. 

A full discussion of this point is beyond the scope of this article, but 
in the matter of experimental design, which, as I have tried to indicate, 
is the most novel contribution of Statistical Methods, it can be said with- 
out hesitation that the new methods have been completely accepted by 
biological and agricultural research workers, and that they are rapidly 
spreading through other branches of scientific and technical research in 
which the variability of the experimental material necessitates refined 
techniques. Their introduction has resulted in an immense gain in the 
accuracy and certainty of experimental results, and there is no reason 
to doubt that the development, which is still continuing, of technqiues 
appropriate to the very varied problems and situations met with in 
scientific experimentation, is on the right lines. As examples, we may 
instance work in recent years on long-term change-over trials in agri- 
culture, biological assay, and industrial experimentation and quality 
control. 

On the other hand the emphasis given to formal tests of significance 
throughout Statistical Methods, and toa great extent also in The Design 
of Experiments, has had two consequences which are not wholly satis- 
factory. In the first place it has resulted in what seems to me to be an 
undue concentration of effort by mathematical statisticians on investi- 
gations of tests of significance applicable to problems which are of little 
or no practical importance. Second, and more important, it has caused 
scientific research workers to pay undue attention to the results of the 
tests of significance they perform on their data, particularly data de- 
rived from experiments, and too little to the estimates of the magnitude 
of the effects they are investigating. 

Historically this situation is understandable. When Statistical Meth- 
ods was written the methods used for testing significance were, as we 
have seen, in the utmost confusion. In the interpretation of their results 
research workers in particular badly needed the convenience and the 
discipline afforded by reliable and easily applied tests of significance. 
The example, quoted above, of an early Rothamsted experiment, shows 
how important this discipline is. Nevertheless the occasions, even in re- 
search work, in which quantitative data are collected solely with the 
object of proving or disproving a given hypothesis are relatively rare. 
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Usually quantitative estimates and fiducial limits are required. Tests 
of significance are preliminary or ancillary. 

The emphasis on tests of significance, and the consideration of the 
results of each experiment in isolation, have had the unfortunate conse- 
quence that scientific workers have often regarded the execution of a 
test of significance on an experiment as the ultimate objective. Results 
are significant or not significant and that is the end ot it. 

Research workers, therefore, have to accustom themselves to the fact 
that in many branches of research the really critical experiment is rare, 
and that it is frequently necessary to combine the results of numbers of 
experiments dealing with the same issue in order to form a satisfactory 
picture of the true situation. This is particularly true of agricultural 
field trials, where in general the effects of the treatments are found to 
vary with soil and meteorological conditions. In consequence it is abso- 
lutely essential to repeat the experiment at different places and in 
different years if results of any general validity or interest are to be 
obtained. In such circumstances a number of experiments of moderate 
accuracy are of far greater value than a single experiment of very high 


accuracy. 

The combination of the results of groups of experiments on the same 
issue introduces problems of statistical technique, particularly in the 
estimation of errors, but also to some extent in the estimation of the 


effects themselves, which are not met with in the analysis of a single 
experiment. Uncritical application of the analysis of variance pro- 
cedure is likely to give uninformative and sometimes misleading results. 

The same situation is met with in the analysis of observational data. 
Multiple classifications are frequently met with in such data, which at 
first sight appear amenable to the analysis of variance technique. How- 
ever, lack of orthogonality introduces many complications, and al- 
though the theory of the subject is well understood, practical methods 
of analysis which do not involve excessive computation are not avail- 
able. Nevertheless the model provided by the simpler case where the 
data are orthogonal is of immense value in indicating the objectives to 
strive for, and we may confidently expect rapid development in this 
field. 

We may expect also to see considerable developments in the practical 
applications of multivariate analysis. Here again practice has lagged 
behind theory because of the large amount of computation required. 
Just as the practical development of experimental design was made 
possible by the introduction of the desk calculati.g machine, which 
enabled the results to be analyzed without undue labor, so we may 
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expect that recent developments of electronic and relay calculators, 
and the wider availability of punched card apparatus, will result in a 
corresponding development of multivariate analysis, so that in a few 
years’ time the sections of Statistical Methods on covariance and dis- 
criminant functions will bear the same relation to this branch of the 
subject as do the sections on experimental design to present practice in 
that field. 
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THE IMPACT OF R. A. FISHER ON STATISTICS* 


Haroup HoTe.Line 
University of North Carolina 


N THIS silver anniversary year of the publication of “Statistical 

Methods for Research Workers” it is good to look back upon the 
state of statistical methods and theory before this remarkable book, 
and to consider the consequences for statistics of it and of the other 
contributions of its distinguished author. In doing this let us deliber- 
ately set aside the important work of R. A. Fisher in genetics, and the 
illumination his published papers and discussions have shed upon a host 
of diverse questions, and consider only his impact on statistical meth- 
ods, statistical inference, the statistical design of experiments, and the 
fundamental philosophy of statistics. Since it is to be hoped that in 
spite of his present preoccupation with genetics his contributions to 
these basic aspects of statistics are not yet complete, this is not an 
attempt even to assess the whole of Professor Fisher’s work in this 
field, but is concerned with the transition from the old to the new 
methods and views, and specifically with his part in bringing about the 
change. I append no bibliography, since Fisher has set an example 
worthy of emulation by including a list of his own publications in this, 
his book of widest circulation, and since partial collections of his papers 
have been published in book form.' In short, these remarks lack many 
features of a good obituary notice, in addition to the fact that the chief 
figure in them is still highly active and in the best of health. 

Before Fisher was Karl Pearson, and before him were Gauss and 
Laplace, the real co-founders of the double science of statistics and 
probability, with Gauss focusing mainly on the statistical and Laplace 
on the probability aspects of our Janus-faced subject. Both Gauss and 
Laplace were truly great mathematicians and astronomers, and Gauss 
was also a great physicist, a surveyer, and other things. Karl Pearson 
began as a mathematical physicist, specializing in the theory of elas- 
ticity, but with the ent' ‘-stic rise of biology that followed Darwin 
was drawn into his series ot “Mathematical Contributions to Evolu- 
tion.” These memoirs were the starting-point for nearly half a century 
of intense and fruitful statistical activity by Pearson, during which he 
became the acknowledged leader in the field and exerted immense in- 
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fluence on statistical teaching, thought and practice. His ideas, re- 
peated often uncritically and sometimes erroneously at second, third, 
and even fifth hand, became the core of a vast number of textbooks on 
statistics. The journal Biometrika and the school of statistics which he 
founded are flourishing today as a center from which emanate a great 
volume of basic contributions, and are presided over by his son Egon 
Pearson. 

To understand what took place in statistics it is necessary to recall 
what was happening to mathematics in the century from Laplace’s 
Théorie Analytique des Probabilités to Fisher’s first memoirs. In general, 
applied mathematics flourished in England while Continental mathe- 
maticians were building up a magnificent structure of pure reason. 
While Green and Stokes were introducing in England the theorems that 
bear their names for the study of mathematical physics, Cauchy and 
Weierstrass were creating the theory of functions and providing exact 
definitions and rigorous arguments applicable to infinite processes. 
Riemann, building on Gauss’s differential geometry, laid the founda- 
tions of modern rigorous metric geometry, and Christoffel’s elaboration 
of these ideas in 1869 laid the basis for the present century’s relativity 
theory. Important British contributions were made to algebra as well 
as to physics, but the subtler reasoning associated with the infinite was 
for the most part done on the Continent. This geographic distribution 
of emphasis has now been altered, but in considering the development 
of statistics in the first quarter of the twentieth century it is well to 
bear in mind the British tradition of mathematical physics, which 
sought primarily for useful formulae and could ignore some incomplete- 
ness in the details of their deductions, some over-condensation and 
abridgement of formal rigor, with a view to speeding the applications 
of the mathematical results obtained, and evidently in many cases also 
with a feeling that any inadequacies in the mathematical reasoning 
would either be corrected or rendered unobjectionable by the results of 
empirical observations. 

The contrast in national attitudes is well brought out by two very 
different publications that appeared near the end of the century from 
Laplace to Fisher. The Calcul des Probabilités of Henri Poincaré, of 
which the second and final edition appeared in 1912, illustrated the 
lucid precision and exactitude of that supreme master of mathematics. 
It contains brilliant passages, and was the outstanding work of its time 
on probability. Yet it introduced nothing of importance for the treat- 
ment of observations, no one refers to it today, and it has become 
simply one of the line of books bearing the same title. 
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In contrast to Poincaré’s brilliance the contributions to probability 
published in 1908 and 1912 by the chemist W. 8. Gosset, writing under 
the pseudonym “Student” to conform to the anonymity then enforced 
by the Guinness firm on its scientists, seem bumbling affairs indeed. 
Gosset set out to find the exact distributions of the sample standard 
deviation, of the ratio of mean to standard deviation, and of the corre- 
lation coefficient. He had trouble with his mathematics. Undaunted, 
he resorted to experiment, and after an extended season of drawing 
randomly shuffled cards and making computations of sample values 
from the results, formed empirical frequency distributions. Following 
the fashion of the time, Gosset fitted Pearson-type frequency curves 
by the method of moments. Later the fitting of frequency curves be- 
came much less fashionable, and the method of moments, as Fisher 
showed in 1921, is often an inefficient one. Gosset, however, got results 
which Fisher later proved correct, in spite of the fact that there was no 
guarantee in advance that the distributions belonged to any Pearson 
type. Later it was found that a part of Gosset’s results had been ob- 
tained mathematically in 1875 by the German astronomer Helmert. 
Altogether the papers of this anonymous “Student” must have seemed 
a pretty dismal flop to any disciple of Poincaré who might somehow 
have been induced to look at them. Yet “Student’s distribution” is 
today a basic tool of a multitude of statisticians who will never have 
any use for the beautiful but relatively inconsequential work of Poin- 
caré in probability; and what is more important, “Student” inspired 
Fisher. 

Ronald Aylmer Fisher was born in London in 1890. He is said to have 
been a precocious child and to have mastered such subjects as spherical 
trigonometry at a very early age. Like Gauss, Laplace, Pearson and 
Gosset he was attracted to physical science, and received his B.A. de- 
gree in astronomy from Cambridge in 1912. He found the theory of 
errors which astronomers had cultivated a major interest, and later 
published a comparison of the distributions of the mean error (mean 
deviation) and the mean square error (standard deviation). In 1912 he 
published his first contribution, in which he proposed maximum likeli- 
hood as a method of fitting frequency curves. 

These two papers represented the first stages of his work on statisti- 
cal estimation, which took on a more general form in his impressive 
1921 memoir, “On the mathematical foundations of theoretical statis- 
tics,” in which he introduced definitions of consistency, efficiency, and 
sufficiency, gave a new derivation of Sheppard’s corrections, advocated 
maximum likelihood estimates, worked out asymptotic standard errors 
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for them, and dethroned the method of moments from its former pre- 
eminent place by showing its inefficiency in many cases for fitting 
Pearson curves. Some typographical and other errors in this paper were 
corrected in Fisher’s 1925 “Theory of statistical estimation,” with 
further improvements in his 1938 Calcutta lectures. Further work on 
estimation appeared in the ninth chapter added in 1928 to Statistical 
Methods for Research Workers, and in several ‘ater papers. In none of 
these writings is the reasoning complete and mathematically rigorous 
in all details, which have had to be filled in with some slight corrections 
and with much expenditure of skilled effort, by later and more mathe- 
matical writers, including Neyman, Doob, Cramér, Rao, Dugué, and 
still more recent investigators whose studies of the principles of sta- 
tistical estimation continue to appear in each issue of the Annals of 
Mathematical Statistics. 

Gauss had made a comparison of the standard errors of various esti- 
mates of the standard deviation based on means of different powers of 
the observations, and Gauss and Laplace had derived the method of 
least squares from the criterion of unbiased estimates of minimum 
variance. These may perhaps be taken as the beginnings of the theory 
of estimation. It is nevertheless beyond question that Fisher has pro- 
vided the principal impetus for the generalized subject, and has been 
the direct or indirect inspiration of most of the work in the field since 
Gauss, and of much more that is to come. His objective has been 
achieved of shaking the complacent assumption that the method of 
moments is the only method to be used for curve-fitting, and of making 
clear the multiplicity for this and other problems of available and 
plausible estimates, among which a choice can be made on the basis of 
sound general criteria. If this lesson has not yet penetrated some of the 
textbooks and courses, the fault is not Fisher’s. Many students no 
doubt still get the impression that the choice between mean, median, 
and mode, or between mean deviation and standard deviation, is to be 
made on the basis either of arithmetical convenience or of the statis- 
tician’s feeling about the suitability of heavy weighting for the outlying 
observations. However, textbooks, and presumably courses in statistics, 
have recently been much less outspoken in proclaiming these hoary 
fallacies, and thus show Fisher’s influence. 

After the introduction of the concept of efficiency in estimation and 
the specific technique of maximum likelihood, Fisher’s second major 
contribution to the statistics of our time is the emphasis on and the dis- 
covery of exact distributions and exact probabilities for testing hy- 
potheses. His first piece of work in this field was his paper of 1915 in 
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Biometrika on the distribution of the correlation coefficient. Earlier 
efforts by Karl Pearson and others had produced probable error and 
moment formulae having only asymptotic validity and of dubious ap- 
plicability to the samples arising in practice. One formula that has 
echoed through many textbooks is 


1 —r? 
VN 


Here N is the sample size, r means the sample correlation coefficient on 
the left and the population correlation coefficient on the right, and use 
of the formula presupposes a random sample from a normally dis- 
tributed population, together with the unverified assumption that the 
distribution of the sample correlation is to a sufficiently close approxi- 
mation normal in form. Even with all these assumptions, the left and 
right sides of the alleged equation are not equal to each other. The 
approximation is somewhat improved if N on the right is replaced by 
the number of degrees of freedom n, which in the simplest case is N—1 
and in case of partial correlation is further diminished by the number 
of variables eliminated. Using Fisher’s important innovation oi differ- 
ent designations for the sample “statistic” (Fisher’s word), which in 
this case is r, and the population parameter, which is for this case 
denoted by p, we can improve the formula by writing it 


1 — p? 
Jn 


This is still not the true standard error o,, but is the first term of a 
series of powers of 1/+/n, and is often a bad approximation. Since also 
the actual frequency curve for r when p is not zero turns out to be 
decidedly skew, it will be seen that the textbook formula has certain 
mathematical deficiencies. 

The student who ignores or surmounts these purely mathematical 
difficulties is still faced by the logical problem of what to put for the 
unknown parameter p when all he actually has is the sample value r. If 
he substitutes r for p, in conformity to what seems to have been the 
usual practice, he thereby introduces an additional bias which for some 
important cases, particularly when p is near zero, may assign to the 
standard error a value only a half or less of the true value. This bias, 
which operates in the same direction as the use of N instead of n, and 
also in the same direction as the neglect of the second-order term in the 
series, must have led to over-estimation of the accuracy of inferences 
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drawn from correlation coefficients in countless cases. In future editions 
of textbooks containing the first formula above it might well be sur- 
rounded by a heavy black border, or accompanied by a skull and cross- 
bones such as appear on labels tor poisons, with a caution that it is for 
external, historical, or minatory use only. and must not be taken in- 
ternally. 

Fisher obtained the exact distribution of the correlation coefficient 
in random samples from normal populations by an ingenious combina- 
tion of geometric and analytic methods. Unfortunately the function is 
not in general easy to compute nor simply expressible in terms of 
previously tabulated functions. Approximations were therefore neces- 
sary. 

The detailed study of Fisher’s correlation distribution was under- 
taken at once with a formidable mathematical armamentarium by H. 
E. Soper, A. W. Young, B. M. Cave, Alice Lee, and Karl Pearson. 
Their paper in Biometrika for 1917 is well worthy of careful study by 
mathematical statisticians as inspiration for work yet to be done on 
other statistical distributions. It consists of 51 pages of mathematical 
text, 35 of tables, and five of plates. In it are several series for Fisher’s 
distribution and for its moments, besides recursion formulae, various 
analytic expressions, graphs, tables, and beautiful photographs of 
models showing the family of distributions of r for different values of p. 
One of the methods, related to that of steepest descents, which ought 
to be more widely known to statisticians than it actually is, was used 
by these authors to find the series of powers of 1/n for the distribution. 
We can now prove this series divergent, but that does not necessarily 
mean it is bad.? 

With the questions of pure mathematical analysis more or less dis- 
posed of, the problem arose as to how the correlation distribution 
should be used for purposes of statistical inference. This was a critical 
point for the development of the modern theory of testing and estima- 





2 In this connection, I may perhaps point out that a form never yet published, 
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shortly afterward ot Miss David's table of Fisher's distribution, which makes all other numerical work 
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study, and by displaying the singularities of the analytic function of n for all negative integral multiples 
of 1/2 proves the divergence of the power series in all cases. 
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tion. Fisher in his original paper introduced the maximum likelihood 
estimate p of p; it is that function of r and n which, substituted for p, 
maximizes f,(r, p), and is generally a little less numerically than r. It is 
to be noted that the idea of maximum likelihood is applied in two 
different ways to get different results. If it is applied to the normal 
distribution of the original independent pairs to estimate the correla- 
tion along with the means and variances, the estimate obtained for p is 
r, which differs from p by a term of order 1/n. 

Bayes’ theorem and inverse probability were still potent in 1917 as 
the generally assumed background for inductive inference. Soper, 
Young, Cave, Lee, and Pearson turned naturally to these ideas when 
at the end of their purely mathematical study they attempted to show 
how to use Fisher’s distribution in practice. Referring to a collection of 
observed anatomical correlations and making a rough fit of a frequency 
distribution to them, they used this as an a priori probability distribu- 
tion to combine with Fisher’s function in calculating an estimate for p 
and probability limits for it. Fisher’s statistic p they described as a 
modal estimate based on an assumed uniform a priori probability den- 
sity for p. This was sharply repudiated by Fisher. 

In the first volume of Metron, published in 1921, Fisher denounced 
the conclusions reached in the cooperative study regarding certain 
population correlation coefficients as preposterously inconsistent with 
the data, and asserted that they reflected the prejudices of the com- 
puter rather than the objective evidence of the observations. He called 
for the use of statistical methods whose conclusions should rest entirely 
on the explicit evidence of a specified set of observations, and not on 
vague a priori probabilities. Without rejecting the possibility of com- 
bining evidence from different sources, he insisted on tests and esti- 
mates based strictly on observations and not at all on assumed a priori 
probabilities. His continued insistence has now apparently led to an 
almost universal assent to the idea, which seemed so strange to the 
statisticians of 1915, that for the codification of accumulating informa- 
tion and the drawing of inferences from observations there are ac- 
ceptable alternatives to Bayes’ theorem. 

It should however be observed that while a priori probabilities have 
gone out, “models” have come in, and that the most interesting and 
widely used statistical methods of Fisher and others assume such con- 
ditions as normality and independence of underlying distributions, 
conditions which often do not exist even approximately and may never 
hold exactly. To complete the work which Fisher has begun with the 
expulsion of inverse probability from the house of statistics we should 
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find ways to allow accurately for the risks involved in the fixed hy- 
potheses. Three ways suggest themselves: rank order statistics; trans- 
formations to normalize distributions and eliminate sources of de- 
pendence such as trend or seasonal variation; and design of experiments 
to assure normality as well as independence. 

After leaving Cambridge Fisher became a statistician for a London 
bond house, and then from 1915 to 1919 taught mathematics in public 
schools. His research in genetics while thus teaching resulted in a highly 
original mathematical treatment of the correlations between related 
persons published in the Transactions of the Royal Society of Edinburgh 
in 1918. It would be interesting to know whether there is any parallel 
case in recent years of a young American high school teacher making 
such a remarkable contribution to science. Some of the men who taught 
me in a public high school in Seattle became known later as research 
scientists and university professors of science, but I have the impression 
that this is now out of fashion, and that high school teachers are sup- 
posed to pay attention to children rather than subjects. 

Sir John Russell, as director of the Rothamsted Agricultural Experi- 
mental Station, had in hand the longest series in the world of crop 
yields with matching weather observations, a series complete since 
1853, and felt it desirable that inferences should be made from these 
records. With some difficulty he persuaded Fisher to leave the class- 
room for a temporary stay at Rothamsted in 1919. The six-month en- 
gagement stretched out to fourteen years, during which Fisher intro- 
duced major innovations in statistical methods, besides serving as con- 
sultant and taking a hand in the widest imaginable variety of investiga- 
tions. “The influence of rainfall on the yield of wheat at Rothamsted,” 
published in 1924, includes innovations in time series analysis and 
multiple correlation which seem to be unknown to this day to most 
statisticians not engaged in agricultural meteorology. This paper should 
be required reading—though it is hard reading—for every economic and 
other statistician using time series. 

Of all the novel ideas which Fisher has introduced into statistics 
those which seem today to have widest use are the analysis of variance, 
which he introduced in 1924 after some preliminary work with the dis- 
tribution in 1922, and the systematic design of experiments, of which 
parts appeared in the original 1925 edition of “Statistical Methods for 
Research Workers.” His Design of Experiments (first edition 1935) gives 
a beautiful exposition of his principles of randomization and replication, 
stratification, confounding, and factorial experiments. A whole new 
field for application of combinatorial mathematics has arisen from this 





IMPACT OF R. A. FISHER 43 


book, and the subject is being vigorously pursued both as to mathe- 
matical theory and in widespread practice. 

Through the analysis of variance Fisher was able to consolidate into 
one technique, using one compact set of tables, methods appropriate 
for a wide variety of problems which had previously been attacked by 
disconnected and approximate methods. Examples of the scattered 
methods now replaced by analysis of variance are intraclass correlation, 
correlation ratio tests, and Blakeman’s criterion of linearity. Great ex- 
tensions of the analysis of variance have of course taken place since its 
introduction. 

By reviving Gauss’s practice of using the number of degrees of free- 
dom rather .nban the sample number as the denominator for sums of 
squares to obtain estimates of variance, and by working out the exact 
distribution of the ratio of independent variance estimates of this kind 
from a normally distributed population, Fisher was able to cut down 
enormously the magnitude of the tables needed for exact tests of sig- 
nificance, and thus to bring exact tests within the practical range of 
routine statistical work. Where Karl Pearson’s gospel had been “Every 
result must be accompanied by its probable error or be disregarded as 
worthless,” Fisher substituted “exact probability test” for the approxi- 
mations, often excessively crude, which had previously had to be used 
for the probable errors. 

The connotations of “exact probabilities” as used by Fisher are that 
these probabilities must depend only on the observations (together 
with maintained qualitative assumptions such as normality), not on 
a priori probabilities; that they must be mathematically correct, not 
asymptotic or other approximations of unknown accuracy; and that 
they must be simple numbers, not functions of unknown parameters. 
As an example of the last point, the use of Student’s distribution avoids 
the old fallacy of treating the ratio of a sample mean to its estimated 
probable error as normally distributed. The last requirement also rules 
out the standard error formula for the correlation coefficient even if the 
previous one does not, excepting as the value of p may be specified by 
the hypothesis under test. Fisher has attempted to meet this need by 
means of the transformation 


z = tanh~'r, ¢ = tanh" p 


for testing whether two sample correlations differ significantly 
from each other, without naming a single specific value for both, 
and for other such purposes. The point is that z is distributed in 
an almost normal distribution about a center which closely 
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approximates ¢, and with a variance which is almost exactly 1/n 
and therefore nearly independent of the unknown parameter p. 
Testing two independent values of r for their difference is therefore 
carried out by treating the corresponding values of z as normally dis- 
tributed with known variances. This seems good enough for ordinary 
purposes, but raises the deeper question whether what Fisher set out to 
do is actually possible with complete exactness. Such questions as this 
are suggested by close scrutiny of several of the techniques provided by 
Fisher’s remarkable insight, and are engaging the attention of members 
of a generation trained in the Continental mathematical tradition of 
rigorous scrutiny and close mathematical argument from explicit 
postulates. 

The whole theory of testing and estimation has of course had a fresh 
start with the work of Neyman and E. S. Pearson that began about 
1928, with a resultant reformulation and modification by an increasing 
group of workers of many of the ideas introduced onginally by Fisher 
as revolutionary improvements of the state of statistics before his time. 
Neyman’s work has been particularly valuable in making explicit the 
concept of the “power function” and its role in testing and estimation. 
Fisher however published the power function of the analysis of vari- 
ance in 1928 at the end of his paper on the general distribution of the 
multiple correlation coefficient, and gave in a different analytic form 
that of the Student t-test in his introduction to the British Associa- 
tion’s 1931 table of generalized Hermite functions. 

It is not possible within the necessary limitations to speak of Fisher’s 
work in multivariate analysis, of his improvements of mathematical 
tables and methods of interpolation, of the controversial Fisher- 
Behrens problem and fiducial probability, of the argument over the 
number of degrees of freedom in a contingency table in which Fisher 
seems to have been much closer to the truth than his opponents, 
of his exploitation of modal expansions similar to those of the method of 
steepest descents, of his contributions to harmonic analysis, his inven- 
tion of sample cumulants as unbiased estimates and development of 
cumulants of cumulants, of his exact tests for fourfold tables and for 
skewness of distributions, of his collaboration with Tippett on distribu- 
tions of extreme values, of his 1934 contribution to the theory of games 
with his demonstration of the need of deliberate randomization of be- 
havior for successful play; nor of many other things. Nor is it possible 
to give any impression of his generous contribution of ideas and in- 
spiration to younger workers, or his tireless advice and assistance to 
workers needing help in the design of their experiments or the interpre- 
tation of their data. 





IMPACT OF R. A; FISHER 45 


Dr. Fisher became Professor of Eugenics at London in 1933, and 
Professor of Genetics at Cambridge in 1943. Recognition of his novel 
ideas was slow in England, slower in America. Editors ought perhaps 
not to accept volunteer reviews of books, but this was the way the 
reviews of the first editions of Statistical Methods for Research Workers 
and of The Design of Experiments reached the Journal of the American 
Statistical Association. Apart from these contributed reviews I find no 
mention in any American book or journal of Statistical Methods for 
Research Workers during its first five years, and only a very few allu- 
sions to Fisher. His name does not seem to appear in the first volume 
of the Annals of Mathematical Statistics, published in 1930, though by 
the following year the impact of his work upon mathematical statis- 
ticians became clear. A summer’s teaching in 1931 brought marked 
attention to Fisher in Iowa and Minnesota, and George W. Snedecor’s 
1934 brochure on the analysis of variance helped to focus attention, 
first in agricultural colleges, then in biological departments all over 
the country, on the indispensable nature of Fisher’s work. The Har- 
vard tercentenary invitation and honorary doctorate in 1936 brought 
much further publicity to Fisher in this country, and some of this 
helped the process of displacing unsound statistical methods by his 
better ones. The process had not gone very far in 1940, but has con- 
tinued during and since the war. 

Textbooks are of course slow to change. It was implicit in the ideas 
published in the first edition of Statistical Methods for Research Workers 
that all textbooks on statistics would have to be replaced by new ones 
in which the old standard sequence of topics would have to be pretty 
completely thrown overboard and replaced by an entirely new kind of 
organization of the subject. But not many authors or teachers saw this 
in 1925, or even in 1945. However in 1950 a wholesale conversion seems 
to be under way. We are at last getting some really excellent books 
on statistics, and I have not seen a new bad book on the subject in 
years. This is a great contrast to earlier times. It now seems that, at 
the very least, a book on statistical methods must say something about 
probability and tests of significance. 

The reasons for the slow acceptance of Statistical Methods for Re- 
search Workers included the very novelty of the ideas in it, but there 
were also other causes. No proofs were given in the book, and the allu- 
sions to places where proofs could be found were not specifically related 
to passages in the book. In combination with the extreme economy of 
words in many critical passages, this absence of proofs left mathe- 
matical readers with a need of several years’ study before the ideas 
could really be mastered. The book was however primarily intended 



























46 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1951 


for biologists, and the large-type “Biological Monographs and Man- 
uals” on the cover and dust jacket has doubtless tended to drive away 
other readers. Biologists have often had great difficulty in trying to 
understand the book, but not so much difficulty as mathematicians, 
and in the early years could find no one to act as interpreter. 

It seems likely that any future books that may usher in such vast 
changes in statistical points of view will avoid these drawbacks. Such 
books will be written with a more fully developed and explicit basic 
theory in the background and will seem more coherent. The author will 
be able to give references to articles in which specific points receive 
satisfactory proofs, or give such proofs in the book. The Continental 
mathematical tradition of explicit and complete argument has now 
taken a firm hold both in this country and in England, and should help 
to minimize errors, misunderstandings, and controversies. Furthermore, 
a statistical public has developed in recent years that is far better able 
to judge and appreciate new basic ideas. Statistics is entering a new 
era of better methods, sounder basic ideas, more adequate mathe- 
matical criticism and constructive activity, faster progress, and greater 
usefulness in more and more kinds of application. For contributing a 
powerful impetus to this movement we have to thank Ronald Aylmer 
Fisher. 
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THE FISHERIAN REVOLUTION IN METHODS 
OF EXPERIMENTATION* 


W. J. YoupEN 
National Bureau of Standards 


HE FIRST edition of Fisher’s Statistical Methods for Research Workers 
"Nasieeel in 1925. Among the fortunate possessors of copies of the 
first edition were two groups of readers who were to be profoundly 
influenced by it, a small group of mathematicians and a larger group of 
scientists confronted with the task of interpreting the results of their 
experiments. 

In 1925 it was still true that the mathematical statisticians and the 
scientists doing experiments had almost no connection with each other. 
Almost two decades had elapsed since Student had, in 1908, published 
his paper in Biometrika on “The Probable Error of a Mean,” demon- 
strating that there did exist statistical procedures properly applicable 
to the limited sets of measurements arising in experimental work. 
Mathematicians had long ignored the statistical problems that arise in 
the interpretation of experimental data. Equally, most scientists were 
unacquainted with Student’s solution of a problem that confronts 
neariy everyone who makes measurements. 

It appears to me that the most important contribution made by 
Fisher is that of bringing together the mathematical statisticians and 
the research workers. The relationship has grown beyond mere ac- 
quaintance: it now approaches fraternization. Unquestionably this 
came about because Fisher inciuded within the covers of one book ma- 
terial of great consequence in the theory as well as in the applications of 
statistics. 

The research worker of course had no way of knowing that the first 
edition of Statistical Methods for Research Workers had within its cover 
a lot of statistical dynamite. It is perhaps as well for the research 
worker’s peace of mind that he was unaware how much was novel and, 
in a sense, not yet accepted by statisticians. I first got some hint that 
this book also held a message for mathematicians when I listened in on 
Professor Hotelling’s lectures on Statistical Inference nearly twenty 
years ago. He told the young men listening to him not to be misled by 
the large print, the wide margins, and a text almost devoid of mathe- 
matical symbols, that in this book were concepts as new to the theorists 
as to the researchers. The early reviews by professional statisticians 





* Presented at a joint meeting of the Institute of Mathematical Statistics and the American Sta- 
tistical Association at Chicago, December 29, 1950. 
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considered that not all the material so confidently presented was on 
firm ground. Meanwhile, a nucleus of research workers began to as- 
similate the ideas in Statistical Methods for Research Workers and to put 
the statistical methods to work. 

Perhaps there is no better way to show you the state of affairs in 
experimental research 25 years ago than to recall two or three examples 
from that period. I can do this because in 1924 I had my doctorate in 
chemistry, a well equipped laboratory, and a score of colleagues all 
engaged in experimental research. I remember a discussion between 
two chemists trying to appraise the difference between two averages 
in terms of the standard error of the difference. The ratio of difference 
to error was finally associated with odds of 49 to one. One of the 
chemists then asked, “Does that mean that the odds are 49 to one that 
the real difference between the averages is at least as large as the differ- 
ence we have found?” The question reveals how unequipped young 
men were in those days in respect to the interpretation of their meas- 
urements. 

The next example is chosen because more experienced workers were 
involved. It was a cooperative study involving government, industrial, 
and university scientists. The problem concerned the effects on the 
yield of a crop when thirty different chemical treatments were applied 
to’the seeds. There was evidently a clear recognition that such large 
numbers of treatments involved troublesome problems of soil hetero- 
geneity. A standard treatment was therefore selected and applied to 
every seventh row across the field. The other treatments were applied 
to the remaining rows. After the crop was harvested the various co- 
operating workers retired with the yields and set about comparing the 
treatments. There was disagreement in the interpretation of the data. 
One group reasoned that it was logical to compare a treatment with 
the average of the two rows of standard treatment that bracketed the 
particular treatment. Another group argued that a particular treat- 
ment should be compared with the nearest row of the standard treat- 
ment. The ranking of some of the commercial preparations depended 
upon which one of these procedures was adopted. No one had given 
any thought to the problems of dealing with the data until they were 
actually in hand. 

In the past twenty-five years, a far-reaching and virtually revolu- 
tionary development in the technique of experimentation has been 
taking place. In many laboratories men began to ask what are the re- 
quirements that must be met in order to permit valid conclusions 
from experiments. The analysis of variance, by revealing what was 
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involved in the contrast of variances, showed experimenters the im- 
portance of placing the comparisons allotted to treatments on the same 
basis as the comparisons used for the estimate of error. The process of 
random assignment or arrangement was seen to be an essential and 
easy method of giving the same opportunity to treatment and error 
comparisons. The experimenter had long attempted to take advantage 
of patches of steady conditions or homogeneous areas in his experimen- 
tal material. The analysis of variance for the first time provided a 
sound statistical technique to accompany experimental arrangements 
such as the replicated block and the Latin Square, which have proved 
so valuable that from them have evolved a large number of other 
experimental designs. Today the experimenter may choose a design 
which is appropriate for the special characteristics of his experimental 
material. Research workers began to use, often literally to copy, certain 
statistical procedures in Fisher’s book. These procedures were often 
imperfectly understood and consequently were sometimes misapplied. 
That workers would venture to use tools that they did not fully under- 
stand is in itself evidence of the great dissatisfaction with past methods. 
of experimentation. We can state with confidence that if it were possible 
to trace out the channel which led a research worker to adopt the new 
viewpoint it would lead, directly or indirectly, to Statistical Methods 
for Research Workers. 

One of the strongest channels for the flow of information was at the 
Statistics Laboratory of Iowa State College, the first of the great 
academic statistical centers in the United States. George W. Snedecor, 
the Director of the Laboratory, arranged for Fisher to lecture at two 
notable summer sessions in 1931 and 1936, and Snedecor in his teaching 
and in his well-known book made Fisher’s methods available to a host 
of workers. This laboratory converted Fisher’s original z-table into the 
convenient F-table, with entries for all commonly occurring combina- 
tions of degrees of freedom. 

Meanwhile, from all quarters of the globe men came to work with 
R. A. Fisher, first at Rothamsted Experiment Station, later at the 
Galton Laboratory, then at the Cambridge Department of Genetics. 
Last summer I examined the record at the Galton Laboratory f the 
workers who studied with Fisher in the period 1934-1944. There are 
over fifty names, from more than a score of countries, working in a 
score of different scientific fields. The significant thing is that a large 
majority of these people were not statisticians, at least not mathe- 
maticians. They came because their own field of work needed the new 
statistical techniques. In many cases these workers have subsequently 
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become so much in demand as teachers or consultants in statistics in 
their original fields of science that they are now classified as statisticians 
by their old colleagues. 

The fields represented may be classified as agriculture, biology, medi- 
cine, or social science. The physical sciences, astronomy, physics, and 
chemistry, are represented by a solitary chemist; only very recently 
have there been signs of interest on the part of workers in the physical 
sciences. It is a foregone conclusion that in these fields also men will 
discover for themselves that the basic ideas first presented 25 years ago 
in Statistical Methods for Research Workers work on their problems too. 
Chemists are more and more aware that their duplicate determinations 
must not be run in parallel if the difference between duplicates is to be 
used to appraise the difference between samples not run in parallel. In 
other words, the requirements for a valid estimate of error are now 
generally understood. More chemists will learn that there are really 
effective ways of separating the sampling error from the analytical 
error without the necessity of running a score or more of analyses on 
the same sample. The replicated block and the Latin Square will be 
tried in many new situations and demonstrate anew their effectiveness. 
The elegance of the analysis of variance will be perceived and for a 
time some papers will tend to dwell more on the statistics than upon 
the experimental conclusions. After a bit the papers will merely cite 
the old classic. Eventually general knowledge of the methods will be 
taken for granted and the citations themselves will disappear. 

To all those who had part in facing the skeptics and conservatives 
there will remain the memory of the exciting days when in isolated 
areas the methods were given a trial and when new converts were 
made. It does not seem an exaggeration to suggest that the ideas in 
Statistical Methods for Research Workers will have doubled the informa- 
tion obtained from a measurement. 





R. A. FISHER’S STATISTICAL METHODS FOR 
RESEARCH WORKERS 


AN APPRECIATION 


KENNETH MATHER 
University of Birmingham 


IOLOGICAL data are inevitably complex. No matter how hard we 

try we cannot bring all sources of variation under control: our 
data always show differences which arise from causes in which we are 
not interested. Such error variation can mask the effects of the agencies 
which we are concerned to investigate, in that the comparisons by 
which we must seek to establish the effects in which we are interested 
will always themselves reflect the variation arising from these uncon- 
trollable and often unrecognised sources. We are therefore always 
faced with deciding whether the comparisons which measure the effects 
of our experimental treatments, reveal differences too big to be reason- 
ably ascribable to error variation. 

In the older sciences of physics and chemistry, error variation is 
generally small enough to be negligible. It is generally easy to see 
whether the experimental treatment is having an effect, because any 
effect but the very smallest will be too big to be ascribable to sources 
of error. Early biological experimentation tended to follow the method- 
ology of chemistry and physics; but here the decision proved to be 
more difficult, because the error variation was no longer negligible. In 
consequence the cautious investigator, by insisting on clear evidence 
of the effects of his treatment, often missed smaller, but nevertheless 
real, differences because these could not be so obviously separated 
from the error variation. At the same time his less exacting fellow was 
led onto false trails by mistaking chance differences for genuine conse- 
quences of his treatments. 

Nor was statistics of any real help in reinforcing and sharpening the 
experimenter’s subjective judgement. The theory of probability had 
grown up as a mathematical discipline ill adapted for dealing with the 
problems of biological experiments. Galton and others had endeavoured 
to adjust the statistics of the time to meet biological needs, but their 
efforts were hamstrung from the first by the requirement that the sam- 
ple should be large before confident analysis became possible. To the 
biologist, statistics seemed a mockery when, after his weeks or months 
of painful experiment, its only advice to him was to go away and make 
ten times as many observations if he wanted to draw statistically reli- 
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able conclusions. Naturally enough the biologists rebelled against this 
tyranny. They went away and stayed away. Not until 1925, when 
Statistical Methods for Research Workers appeared, did any number of 
them begin to change their minds. 

The first step towards the statistics that biologists needed was taken 
in 1908, and then, interestingly enough, by one who was primarily 
neither a mathematician nor a biologist. W. S. Gosset, or “Student” as 
he will always be known to statisticians, was trained as a chemist; but 
he was concerned with the biological problems of the brewing industry. 
He had been made aware by experience of the nature and difficulties 
of biological variation. He knew that there existed no method of 
statistical analysis capable of dealing with the sort of experimental 
results which were the only basis he had for the executive decisions he 
had to make. So he set about devising a method for himself and the 
result was the first exact test of significance as we should call it; the 
first test that faced up to and overcame the difficulties of the small 
sample. His work was not appreciated by the professional statisticians 
and it was some years before anything further was done. But when it 
did come the next development was so wide and sweeping as to be 
revolutionary. 

R. A. Fisher had that rare combination of mathematical training and 
skill with biological understanding. He had been writing on the mathe- 
matical aspects of biological, and particularly genetical, problems al- 
most since his graduation from Cambridge, and, we may observe, 
without receiving any particular encouragement from either statistician 
or biologist. But when he moved to Rothamsted in 1919, his work be- 
came clear before him. The wealth of biological experimentation there 
in progress, agronomical, bacteriological and the rest, was essentially 
quantitative in nature. But it lacked quantitative methods of sufficient 
refinement and power. Fisher began to review and develop the whole 
field of statistics and always to the same end, not merely of giving the 
biologist the means of handling the kind of data to which biological 
experiment leads, but of giving him exact and rigorous methods of 
doing so. 

This was no mere task of applying known principles. The mathe- 
matical foundations had to be laid, “Student’s” approach generalised 
and strengthened to carry the immense structure which was to be 
built, the whole theory of tests of significance and of estimation re- 
examined and developed, for in no other way could the needs of Roth- 
amsted—itself already a microcosm of biology—be met. The mathe- 
matical foundation was presented in a series of papers in various math- 
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ematical and statistical journals. But in 1925 came the publication 
which really gave the biologist what he wanted. In Statistical Methods 
was presented the battery of new statistical methods, set out and 
illustrated by examples in such a way that the biologist—albeit with a 
little head scratching, for he was seldom mathematically minded by 
training and not often even by taste—could understand. 

In spite of some guarded reviews, of doubts about the wisdom of en- 
couraging biologists to run before they had learned to walk, the book 
rapidly achieved success. It did so for the perfectly simple reason that 
it gave biologists what they needed. There may have been some doubts 
and a little caution at first, but these were soon dispelled, for not 
merely did the methods look all right, they also worked. They enabled 
the biological experimenter to arrive, for the first time, at a balanced 
assessment of the meaning of his data, and to the younger biologists 
at least, they opened up new vistas of confident analysis and econom- 
ical experimentation. 

The work did not, of course, stop in 1925. Indeed, it had only begun. 
The steady expansion of Statistical Methods itself through its progress 
of editions—eleven up to 1949—is sufficient evidence of this. More 
was, however, to come. Statistical Methods proved to be but the first of 
a trilogy. The tables which had constituted such an invaluable feature 
of the book were expanded into Statistical Tables for Biological, Agri- 
cultural and Medical Research, published jointly with F. Yates in 1938. 
Like its predecessor, Statistical Tables has expanded its content, and 
achieved even further use with the progress of its editions. 

The remaining member of the trilogy—in point of chronological fact 
the second to appear—is The Design of Experiments, published in 1935. 
Although in a sense arising from and foreshadowed by Statistical Meth- 
ods, the Design has a different content and breaks deeper ground. The 
new methods had not merely shown how data could be most profitably 
analysed, they had also made clear how data could best be gained. Cer- 
tain requiremerts must be met if analysis was to extract all the infor- 
mation the data could yield, so that criteria could be laid down by 
which the efficiency of experimental methods themselves could be 
judged. It had become, in fact, possible to treat the design and conduct 
of experiments as an exact science. At the same time, consideration of 
the mathematical foundation had led Fisher to a new and more precise 
formulation of the logic of experimental science. These principles of 
inductive reasoning and their application in experimental design were 
set out in the new book. The Design does not offer that wide compen- 
dium of practical methods which has made Statistical Methods so 
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widely popular, but it goes deeper. It shown us why those methods 
work, enables us to appreciate and extend them, and gives us a new in- 
sight into the meaning and practice of the experimental method. It may 
well be regarded as marking one of the great scientific advances of the 
half-century. 

The impact of Statistical Methods and its successors on biology has 
been tremendous. We have been given a mathematics of experiment. 
Biology has been made quantitative in a new and deeper sense, and 
biologists have become number conscious. We have been shown how to 
lay out our experiments efficiently and assess their results confidently. 
We have been taught to regard the statistician as a valuable friend who 
offers us help in using our limited resources and overcoming our per- 
plexing difficulties. And we have been thereby emboldened to tackle 
and solve problems otherwise impossible. This is as true in the field of 
applied biology as of pure. The breeder of hybrid corn owes as much to 
the new statistics as does his more purely genetical colleague. Neither 
could have got so far as he has without it. 

And perhaps most significantly of all, a generation of biologists is 
growing up to whom exact experimentation is a commonplace. In some 
universities, the biological students are already being taught the prin- 
ciples of statistics as an essential part of their training. Statistics is no 
longer a closed mystery. It is becoming an everyday tool, a mode of 


thought which all biologists can employ, and to the development of 
which some of us have been encouraged and enabled to contribute. To 
the biologist, Statistical Methods marks the beginning of this change. 
This is our justification for regarding 1925 as a significant date in the 
history not only of statistics but also of biology. 





THE THEORY OF STATISTICAL DECISION 


L. J. SAVAGE 
University of Chicago 


BRAHAM WALD’s recent book, Statistical Decision Functions [10], 
A presents a new theory of the foundations of statistics.! The vigor- 
ous exploration of this theory was begun by Professor Wald five or six 
years ago and is being continued under his leadership. Since almost 
all published treatments of this theory known to me, including this one, 
are mathematically forbidding,? and since the theory promises to be of 
great interest to all statisticians, it seems appropriate to attempt an 
informal exposition of it. The critical and philosophical remarks in 
this exposition may not accurately represent the views of Professor 
Wald, for both in writing and lecturing, he prefers to be rather non- 
committal on such points. 

Traditionally, the central problem of statistics is to draw statistical 
inferences, that is, to make reasonably secure statements on the basis 
of incomplete information. This entails other problems, particularly 
that of designing experiments which permit, the strongest inference for 
the expenditure involved. The new theory under discussion, however, 
centers about the problem of statistical action rather than inference, 
that is, deciding on a reasonable course of action on the basis of incom- 
plete information. There is clearly an abundance of situations calling 
for statistical action. Industrial quality control is a clear cut and famil- 
iar domain of examples. The problem of design evoked by the tradi- 
tional inference viewpoint is also a problem of statistical action. Much 
more generally, it can be argued that all problems of statistics, includ- 
ing those of inference, are problems of action, for to utter or publish 
any statement is, after all, to take a certain action. Since the conse- 
quences of statements regarded as actions, especially their influences 
on those to whom they are directed, are often unusually difficult to 
analyze and appraise, for example, when the statement refers to 
academic science, it is well to point out that many statements, es- 
pecially of applied science, are tantamount to deciding on a concrete 
action. Thus, medical diagnosis cannot in principle be separated from 
the choice of treatment. The typical agronomic experiment, though 





1 This paper was undertaken as a review of Wald’s book. In requesting the review, the Editor sug- 
gested that some exposition of the field to which the book pertains be given prefatory to the review. 
This exposition so extended itself that the paper is no longer exclusively a review. 

This paper is in part based on research supported by the Office of Naval Research. 

[ * aes general exposition of the theory and an original example are given elementary treatment 
in [7]. 
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commonly discussed in the language of inference, is plainly eccncerned 
almost exclusively with deciding on a course of action. 

As has already been suggested, the term “course of action,” or 
briefly “act,” is to be understood in a flexible sense. The following chain 
of examples illustrates how different may be the scopes of the acts asso- 
ciated with a single situation, depending on context: 


(i) Decide whether to accept, i.e., to buy, a certain lot of screws. 

(ii) Decide among all possible rules for accepting or rejecting the lot, 
based on the thousand and one possible outcomes of inspection of 
a thousand screws chosen from the lot at random and classified 
as defective or nondefective. 

(iii) Decide among all possible experiments, including sequential ones, 
and all possible associated acceptance rules for accepting or re- 
jecting the lot. 


As the third example suggests, the whole design of a complicated sta- 
tistical program can be regarded as a single decision to adopt one of an 
enormous number of possible acts. In a highly idealized sense, a man 
might be thought of as making only one decision in his whole life, 
namely the decision to conduct himself according to some set of max- 
ims envisaging all eventualities. This is manifestly far-fetched, but the 
theory does call attention to the propriety of considering very large 


decision problems as organic wholes. In practice, it will presumably 
continue to be necessary to break such big decision problems into 
pieces, rigorously or approximately independent of one another. 

Acts have consequences for the actor, and these consequences de- 
pend on facts, not all of which are generally known to him. The un- 
known facts will often be referred to as states of the world, or simply 
states. Suppose for example, that on a given occasion you face the de- 
cision of whether or not to carry an umbrella, and consider for sim- 
plicity only two possible states, future rain and future shine. The 
consequences of each act in each possible state might be given by the 
following table: 








State 
Rain Shine 
Act 





Carry Inconvenience and wet Inconvenience and slight 
feet embarrassment 





Don’t carry Miserable drenching Bliss unalloyed 
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In some situations the possible consequences of each act in each 
state are cash incomes, or at any rate expressible in terms of cash in- 
comes. Thus in the rain-shine example above, you might perhaps con- 
sider the monetary values expressed in dollars by the following “in- 
come matrix” reasonable: 





x 


State 








Carry 4 5 











Don’t Carry —10 10 





It will be assumed in this exposition, except in the next paragraph 
where some other possibilities are mentioned in passing, that in the 
contexts to be discussed the consequence of any act is to all intents and 
purposes cash income (possibly zero or negative) which may depend on 
which of the possible states of the world is in fact the real one, and 
that a situation with a known expected (or actuarial) cash income is as 
valuable to the actor as a pure cash income of the same amount. The 
terms are to be understood in such a sense that the actor knows how 
his income depends on his action and the unknown state of the world, 
ie., he knows the income matrix of the situation. 

The assumptiofis just made are more restrictive than is logically 
necessary but are adopted here for definiteness. It is enough that to 
each possible consequence there should correspond a numerical score 
such that higher scores are preferred to lower scores, and a given ex- 
pected score is as acceptable as the corresponding sure score. Cash 
income, which has here been adopted as the epitome of such a score, 
does in many situations have the postulated properties. Other possible 
scores are expected number of lives saved, and probability of over-all 
success in some venture. The von Neumann-Morgenstern theory of 
utility? suggests that very generally there will exist some suitable 
score. 

If in a given situation the actor assigns probabilities to the various 
unknown states of the world, he can calculate unambiguously the ex- 
pected cash income associated with any action. By the assumptions al- 
ready made, he then acts in such a way as to maximize his expected 





3 This theory in its modern form is discussed by its authors in [9]. The more elementary [5] may 
also be consulted. Paul A. Samuelson has pointed out to me that [5] errs in asserting that. three proposi- 
tions listed on p. 288 are equivalent to the von Neumann-Morgenstern system. 
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cash income, and the decision problem is relatively trivial and old- 
fashioned.‘ If the actor does not assign probabilities, this trivial solu- 
tion does not apply and the problem is newer. In fact, the problem of 
dealing with uncertainty when probability does not apply to the un- 
known states of the world has been the main theme of modern, or un- 
Bayesian, statistical theory. The actor may attach probability to some 
aspects of the unknown states but not to others. For example, he 
might have to do something about a coin knowa to have an unknown 
probability of heads. This seemingly more general situation can be 
shown to reduce to that in which no probabilities are assigned. Sup- 
pose, for example, the problem is to guess how a certain coin will fall 
on a single trial, possibly after some experimentation, and that it is 
known that for this coin heads had probability either one-third or two- 
thirds. Clearly the proilem seduces to that of guessing which of the 
two possible probabilities applies. 

Even if probabilities are not assigned, there is one unquestionably- 
appropriate criterion for preferring some act to some others: If for 
every possible state, the expected income of one act is never less and is 
in some cases greater than the corresponding income of another, then 
the former act is preferable to the latter. This obvious principle is 
widely used in everyday life and in statistics, but only occasionally does 
it lead to a complete solution of a decision problem. 

Before going further, the idea of a mixed act, or mixed strategy, as it 
is commonly called, should be introduced. Instead of choosing a par- 
ticular act outright, the actor might prefer to let chance play such a 
role in his choice that each act has a probability determined by the 
actor of being chosen. For example, in a choice between two acts, the 
actor might flip two coins, agreeing in advance to choose the first act 
if both coins fall heads, and otherwise to choose the second. The ad- 
mission of mixed acts obviously does not restrict the actor since he can, 
if he chooses, attach probability one to any particular act, that is, 
choose it outright. On the other hand, it is not offhand clear that it is 
of any advantage to consider mixed acts; indeed, the contrary can be 
seriously argued. The fact that the process of randomization, so essen- 
tial to modern statistical methods, is an employment of mixed acts, 
certainly suggests that they may be advantageous. 

To state the general rule by which the theory of statistical decision 
functions tentatively proposes to solve all decision problems, some 





4In the rain-shine example, if probability applies to the states easy calculation shows that you 
should carry, not carry, or be indifferent according as the probability of rain is more than, less than, 
or equal to 5/19. 
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mathematical notation is appropriate. Let I(a, s) denote the expected 
income resulting from the (mixed) act a if the world is in state s, and 
assume for simplicity that the number of possible states s and of un- 
mixed acts is finite, though possibly enormous. Understand by the 
loss, L(a, s), associated with the act a and state s, the difference be- 
tween (i) the most that can be made by any act if the state s obtains 
and (ii) the amount the act a makes if s obtains; that is, in notation 
which will explain the words to the mathematical and the notation 
“max” to the verbal, 


L(a, 8) = max I(a’, s) — I(a, 8). 
a 


The loss may be said to measure how inappropriate the action a is 
in the state s. In the rain-shine example, it is easily verified that the 
loss for the unmixed acts is: 








State 





Carry : 0 - 





Don’t carry ~\I0) 14 











If you give yourself probability P of carrying the umbrella and 1—P 
of not carrying it, the loss is given by 








State 
Rain 





, 14(1 —P) 5P 











The following general rule, called the minimax principle, is central 
to the theory of statistical decision functions, at least today. In fact, 
it is the only rule of comparable generality proposed since Bayes’ was 
published in 1763 [1]. The minimax principle is: Choose such an a 
that max, L(a, s) shall be as small as possible, that is, minimize the 
maximum loss. There may be more than one act fulfilling the require- 
ment of the principle, in which case other criteria are generally invoked, 
particularly the trivial one mentioned a few paragraphs above concern- 





5 This easy calculation is made even easier by the reflection that there is always some unmixed act 
as appropriate to a given state as any mixed one. 
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ing two acts one of which has in each state as high an income as the 
other. 

Examples are imperative here. Though small and artificial, they 
will exhibit many salient features of the general situation. First, con- 
sider the rain-shine example. If P=14/19, L=70/19. For lower P the 
loss is more than 70/19 in case of rain, and for higher P it is more than 
70/19 in case of shine. Carrying the umbrella with probability 14/19 
is therefore the minimax act in this utterly academic situation. 

Turn now to a more elaborate example. Let it be given that of four 
numbered coins three are pennies and one is a dime, or else one is a 
penny and three are dimes. There are thus eight possible states s, be- 
cause any of the four coins may be the singular one, and in two ways. 
Let an unmixed action consist in this: first, examine one of the coins 
chosen by number or refrain from so doing; second, guess the denomi- 
nation of the singular coin. Let your income be computed on the follow- 
ing basis: 

(i) If the singular coin is a penny you pay a tax of $10, if it is a dime 
you receive a bonus of the same amount. 
(ii) If you choose to examine a coin you must pay $1 inspection cost. 
(iii) If your guess was incorrect, you pay a $4 penalty. 


The first step in applying the minimax rule is to compute the loss 


function. Since item (i) of income does not depend on your act, it plays 
no role in the loss. On the other hand, if you knew s, you could act in 
such a way as to avoid both inspection cost and penalty; thus, insofar 
as you incur them, they are your loss. If you choose not to examine any 
coin, you might, for example, guess that the singular coin is a penny if 
a fair coin of your own falls heads. The expected loss for this act is 
easily seen to be $2 no matter what s may be. It is almost obvious that 
any other act avoiding the inspection cost can for suitable s lose more. 
For example, both unmixed acts in this category can lose $4. 

Can you do better by examining a coin, despite the inspection cost of 
$1? Not if you choose a definite coin, for this does not change the situa- 
tion with respect to possible penalty, and it adds the inspection cost. 
If, however, you choose one of the four coins at random, and guess 
that the singular coin differs from it in denomination, you will be right 
three times in four and your expected loss will be $1+3X$2=$1.50, 
no matter what s may be. It will, therefore, be better to pay the in- 
spection cost (at least some of the time). 

In fact, the act just described is minimax, as may be seen by the fol- 
lowing line of argument, which is of very general applicability. If the 
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loss be averaged with equal weights over each of the eight possible 
states, it can be seen with patience that no unmixed act can have a 
smaller average loss than $1.50. If this is true for unmixed acts it 
follows immediately for mixed ones. Since the average loss of any act is 
at least $1.50, the maximum loss of any act must be at least that great, 
which was to be proved. The effective weights will not in general be 
equal; their equality in this example was contrived for simplicity of 
exposition. 

Little, if any, reason for adopting the minimax principle seems to 
have been published, though the literature suggests that some individ- 
uals find it intuitively acceptable, and demonstrates that almost all 
ordinary principles of statistics can, with suitable special assumptions 
about the loss function, be exhibited as special cases of it. This fact 
must not be taken too seriously as argument for the principle, since by 
postulating a sufficiently arbitrary loss function literally any rule of 
action can be seen as a special (or limiting) case of it. The principle 
that unbiasedness is a desirable characteristic of estimates is an exam- 
ple of a principle which does not follow from the minimax principle 
without straining the interpretation of terms, but it is noteworthy 
that the principle of unbiasedness in estimation is currently held in 
low esteem by the consensus of statisticians. 

It seems clear that no categorical justification for the minimax rule 
can be given. Suppose, for example, that you must choose sides in an 
even money bet that Switzerland will become a monarchy sometime in 
1951. If the concept of probability is not applicable to that event, and 
it is typical of the sort of event to which modern statisticians generally 
consider probability inapplicable, the rule says you should bet on 
monarchy if a fair coin falls heads. 

Stimulated by the ideas of Bruno de Finetti, as expressed in [3] and 
[4], I would tentatively suggest the following as one possible motivation 
for the minimax rule. I here suppose, contrary to the opinion of most 
modern statisticians, that an individual actor faced with a choice 
among actions will personally find some more attractive than others. 
More fully, he will feel each to be equivalent to a certain cash income. 
Such an individual will simply choose the act which he feels is equiva- 
lent to the highest cash income. You, for example, would reject an 
even money bet that Switzerland will become a monarchy in 1951. 
(If not, please wire collect.) If, however, the actor is a group of individ- 
uals who must act in concert with a view to augmenting the income 
of the group, the situation is radically different, and the problems of 
statistics may often, if not always, be considered to be of this sort. 
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ing two acts one of which has in each state as high an income as the 
other. 

Examples are imperative here. Though small and artificial, they 
will exhibit many salient features of the general situation. First, con- 
sider the rain-shine example. If P=14/19, L=70/19. For lower P the 
loss is more than 70/19 in case of rain, and for higher P it is more than 
70/19 in case of shine. Carrying the umbrella with probability 14/19 
is therefore the minimax act in this utterly academic situation. 

Turn now to a more elaborate example. Let it be given that of four 
numbered coins three are pennies and one is a dime, or else one is a 
penny and three are dimes. There are thus eight possible states s, be- 
cause any of the four coins may be the singular one, and in two ways. 
Let an unmixed action consist in this: first, examine one of the coins 
chosen by number or refrain from so doing; second, guess the denomi- 
nation of the singular coin. Let your income be computed on the follow- 
ing basis: 


(i) If the singular coin is a penny you pay a tax of $10, if it is a dime 
you receive a bonus of the same amount. 
(ii) If you choose to examine a coin you must pay $1 inspection cost. 
(iii) If your guess was incorrect, you pay a $4 penalty. 


The first step in applying the minimax rule is to compute the loss 
function. Since item (i) of income does not depend on your act, it plays 
no role in the loss. On the other hand, if you knew s, you could act in 
such a way as to avoid both inspection cost and penalty; thus, insofar 
as you incur them, they are your loss. If you choose not to examine any 
coin, you might, for example, guess that the singular coin is a penny if 
a fair coin of your own falls heads. The expected loss for this act is 
easily seen to be $2 no matter what s may be. It is almost obvious that 
any other act avoiding the inspection cost can for suitable s lose more. 
For example, both unmixed acts in this category can lose $4. 

Can you do better by examining a coin, despite the inspection cost of 
$1? Not if you choose a definite coin, for this does not change the situa- 
tion with respect to possible penalty, and it adds the inspection cost. 
If, however, you choose one of the four coins at random, and guess 
that the singular coin differs from it in denomination, you will be right 
three times in four and your expected loss will be $1+3X$2=$1.50, 
no matter what s may be. It will, therefore, be better to pay the in- 
spection cost (at least some of the time). 

In fact, the act just described is minimax, as may be seen by the fol- 
lowing line of argument, which is of very general applicability. If the 
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loss be averaged with equal weights over each of the eight possible 
states, it can be seen with patience that no unmixed act can have a 
smaller average loss than $1.50. If this is true for unmixed acts it 
follows immediately for mixed ones. Since the average loss of any act is 
at least $1.50, the maximum loss of any act must be at least that great, 
which was to be proved. The effective weights will not in general be 
equal; their equality in this example was contrived for simplicity of 
exposition. 

Little, if any, reason for adopting the minimax principle seems to 
have been published, though the literature suggests that some individ- 
uals find it intuitively acceptable, and demonstrates that almost all 
ordinary principles of statistics can, with suitable special assumptions 
about the loss function, be exhibited as special cases of it. This fact 
must not be taken too seriously as argument for the principle, since by 
postulating a sufficiently arbitrary loss function literally any rule of 
action can be seen as a special (or limiting) case of it. The principle 
that unbiasedness is a desirable characteristic of estimates is an exam- 
ple of a principle which does not follow from the minimax principle 
without straining the interpretation of terms, but it is noteworthy 
that the principle of unbiasedness in estimation is currently held in 
low esteem by the consensus of statisticians. 

It seems clear that no categorical justification for the minimax rule 
can be given. Suppose, for example, that you must choose sides in an 
even money bet that Switzerland will become a monarchy sometime in 
1951. If the concept of probability is not applicable to that event, and 
it is typical of the sort of event to which modern statisticians generally 
consider probability inapplicable, the rule says you should bet on 
monarchy if a fair coin falls heads. 

Stimulated by the ideas of Bruno de Finetti, as expressed in [3] and 
[4], I would tentatively suggest the following as one possible motivation 
for the minimax rule. I here suppose, contrary to the opinion of most 
modern statisticians, that an individual actor faced with a choice 
among actions will personally find some more attractive than others. 
More fully, he will feel each to be equivalent to a certain cash income. 
Such an individual will simply choose the act which he feels is equiva- 
lent to the highest cash income. You, for example, would reject an 
even money bet that Switzerland will become a monarchy in 1951. 
(If not, please wire collect.) If, however, the actor is a group of individ- 
uals who must act in concert with a view to augmenting the income 
of the group, the situation is radically different, and the problems of 
statistics may often, if not always, be considered to be of this sort. 
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Thus, for example, if an experimenter claims to have evidence that a 
certain agronomic technique is preferable to others, this evidence 
should, so far as possible, look convincing to everyone concerned. 
Again, one might idealize at least part of the activity of academic 
science in terms of decisions reached jointly by the whole group of 
people interested and competent in the science in question. 

Suppose then, that the individual s in a group evaluates the action a 
to be equivalent to the income I(a, s). The parallelism between the 
notation J(a, s) here and in its earlier contexts is more than formal, for 
the individual s may well be said to believe that the world is in such a 
state that the expected income of a is I(a, s). If the action a should be 
chosen, the individual s would consider that the income L(a, s) had 
been lost. Application of the minimax principle from this point of view 
means to act so that the greatest violence done to anyone’s opinion 
shall be as small as possible. Such an action may sometimes be justi- 
fiably adopted, especially if the greatest violence resulting from it is 
too small for anyone to wrangle over. Where effective experimentation 
is a component of some of the possible actions, practical agreement may 
well be reached in this way. For unless two opinions are originally 
utterly incompatible, enough relevant evidence will bring them close 
together, and even if they are utterly incompatible the holder of each 
will feel that he has little to lose in agreeing to a sufficiently extensive 
fair trial. For example, you might be rather sure that a particular coin 
is fair and I just as sure that it will fall heads three times out of four, 
and we might both agree that these are the only two possibilities worthy 
of consideration. If the cost of observing a thousand tosses of the coin 
is very small compared with the stakes on which our joint action de- 
pends, we may well be satisfied to abide by an experiment based on the 
thousand tosses, because for each the loss is little more than the small 
cost of the experiment. In connection with the interpretation of the 
minimax principle as a principle for group action, it is relevant to men- 
tion the technical fact that in many cases, under the minimax rule the 
same amount will be given up by each member of the group. 

There might well be some argument for preferring to minimize the 
average of L(a, s) over the group, or for adopting some other “demo- 
cratic” principle. But it should be remembered that in many applica- 
tions of statistics, it may be impractical to define membership in the 
group, let alone to poll the members. In such a case it is generally much 
more feasible for the responsible statistician to treat all “reasonable” 
opinions by the minimax principle. 
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The mathematical problems posed by the minimax rule are in prin- 
ciple exactly the same as those posed by the theory of (zero-sum) two 
person social games formulated by von Neumann.* But studies of the 
minimax rule have a special mathematical tone because there are fre- 
quently infinitely many states and unmixed actions in statistical prob- 
lems, whereas the analogous problem of games in which the players 
have an infinite number of moves is rather academic. The close parallel- 
ism between the theory of games and that of the minimax rule is very 
stimulating to both theories, but it has tended to direct attention 
away from the great conceptual differences between them. 

There is a general principle for finding minimax actions when the 
number of states and mixed actions is finite, but it leads in general to 
very extensive computations and is not applicable at all when either 
of these numbers is infinite. Devices for solving special classes of mini- 
max problems are therefore much sought after, and even now many 
of the most commonplace situations of statistics lead to difficult mini- 
max problems. 

Few, if any, new minimax solutions of immediate practical impor- 
tance have yet been found. This is due partly to the difficulty just al- 
luded to and partly to the understandable preoccupation of Wald and 
others working in the field with fundamental theoretical questions. On 
the other hand, the theory of statistical decision, and more particu- 
larly the minimax rule, has led to some profound theoretical results, 
especially in sequential analysis, and has greatly unified the whole 
field of theoretical statistics. 

It is often said that the minimax principle is founded on ultra-pessi- 
mism, that it demands that the actor assume the world to be in the 
worst possible state. This point of view comes about because neither 
Wald nor other writers have clearly introduced the concept of loss as 
distinguished from negative income. But Wald does frequently say 
that in most, if not all, applications J(a, s) is never positive and that it 
vanishes for each s if a is properly chosen, which is the condition that 
minus I(a, s)=L(a, s). Application of the minimax rule to minus 
I(a, s) generally, instead of to L(a, s), is indeed ultra-pessimistic; no 
serious justification for it has ever been suggested, and it can lead to 
the absurd conclusion in some cases that no amount of relevant experi- 
mentation should deter the actor from behaving as though he were in 
complete ignorance. In the more elaborate example discussed a few 





® von Neumann opened the subject in [8], which is still in many ways the best introduction to it; 
but chapters III and IV of [9] are newer and fuller. 
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paragraphs above, for example, if you applied the minimax principle 
to negative income your behavior would be not only affected but ac- 
tually dominated by part (i) of the income, which is irrelevant to the 
loss, and you would always guess “penny” even if inspection cost were 
reduced to zero. 

It is often pointed out that knowledge of L(a, s) implies economic 
knowledge not often available to the working statistician. This is true, 
and often means that the statistician cannot apply the minimax prin- 
ciple or otherwise (no matter what existing theory he may adopt) ad- 
vise his clients how to act. In such a case, he will generally have to 
recommend an action for each of several different loss functions, leav- 
ing it to his client to determine which loss function applies. Occasion- 
ally, however, it happens that some one action fits, or nearly fits, all 
“reasonable” loss functions. For example, in estimating the mean of a 
normal distribution from N observations, if the loss function depends 
only on the absolute difference between the true mean and the esti- 
mated mean and is bigger for bigger absolute differences, then choice 
of the sample mean is a minimax act. 

If a new possible act is adjoined to those originally available, a new 
minimax act will generally arise, and it may happen that this new 
act, though different from the old, assigns no probability to the newly 
available act, i.e., does not represent the new act at all in the mixture 
of unmixed acts which make it up. This has been regarded as paradoxi- 
cal, for why should an individual shift his preference from one act to 
another just because he can now act in a way which was not available 
before, but which in fact he does not choose at all [2]? From the point 
of view about the motivation of the minimax rule sketched above, 
there is no paradox, for when the new act is admitted the group may 
well change its choice to arrive at a compromise with some members 
who prefer the new possibility, without actually adopting the new possi- 
bility itself. 

There is a natural problem of estimating the parameter of a binomial 
distribution for a “large” sample in which the loss as a function of P, 
which is the unknown state in this case, for the minimax estimate dif- 
fers strikingly from that of the usual estimate, namely the success 
ratio. In this case, the loss is much lower for the usual estimate, except 
in the immediate neighborhood of P=}, where it is a trifle higher. 
Many say, and I with them, that the minimax estimate is in common 
sense not as good here as the usual estimate.’ One way to analyze this 
dissatisfaction is to admit that in the minimax problem too many 
opinions have been admitted as “reasonable.” In particular, an almost 





7 See especially page 190 of [6]. 
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unshakable prejudice that the true value of P is almost precisely one- 
half, does not seem reasonable in the usual situations giving rise to 
this problem. On the other hand, a clear-cut criterion of the reasonable 
opinions seems quite impractical even in this simple problem, so this 
example seems to me to raise a real difficulty for the theory. 

Wald’s book presents almost all the technical information heretofore 
known on the theory of statistical decision, together with some new 
material. The treatment and many of the topics are quite advanced 
mathematically. The mathematical background expected of the reader 
is described in the Preface as “some knowledge of probability, including 
probability distributions in the infinite dimensional space...” and 
“a knowledge of calculus and some familiarity with the elements of set, 
measure, and integration theories. ..”. This knowledge will not be 
possessed, or even be easily attainable, by the great majority of com- 
petent practicing and creative statisticians today; and, as in so many 
other prefaces, these nominal requirements are gross understatements, 
except for that purely nominal reader of unlimited intelligence and 
perseverance. The book is, then, directed toward a small audience to- 
day; as a progress report on a field rapidly progressing under Wald’s 
own leadership, it will be obsolete tomorrow. 

It seems to me unfortunate that Wald has avoided such extra- 
mathematical questions as possible motivations for adopting the mini- 
max rule and the distinction between negative income and loss. He 
does mention that mathematical study of the minimax rule leads, 
through a technical artifice, to a certain theorem of interest in itself, 
irrespective of interest in the minimax principle as a rule for action; 
but this is not, and is not presented as, a reason for adopting it. Read- 
ing between the lines, it appears that Wald means to work with loss 
and not negative income. For example, on p. 124 he says that if a cer- 
tain experiment is to be done, and the only thing to decide is what to 
do after seeing its outcome, then the cost of the experiment (which may 
well depend on the outcome) is irrelevant to the decision; this state- 
ment is right for loss but wrong for negative income. 

I find the book difficult and irritating to read, which is especially 
disappointing to one who has had the pleasure of hearing Wald lecture. 
The technical nomenclature and notations seem unnecessarily com- 
plicated and unsuggestive. Such judgments are at best subjective, but a 
few examples may convey my meaning: (i) On p. 4, allusion is made to 
a “finite set of integers 7%, +--+, 7 which are pairwise different.” In 
no context could “pairwise” modify the meaning of “different,” and in 
this context “different” does not modify the meaning of “integers,” 
so the technical-looking phrase beginning with “which” is doubly re- 
dundant. On the other hand, it is not mentioned that the set in question 
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is supposed to be non-vacuous. (ii) The technical term “minimal com- 
plete class” is introduced on p. 15, where a fact is immediately demon- 
strated which renders this term superfluous. (iii) In effect, an action a 
is called “uniformly” better than a’, if for every s, L(a, s)SL(a’, s), 
with inequality holding for at least one s. This is not at all consistent 
with the established mathematical meaning of “uniformly.” The word 
“distinctly,” for example, would be better. (iv) The subject matter of 
Chapter 2, zero-sum two person games, is such that every theorem 
which can be stated about it has, so to speak, a mirror image. The 
mirror image will generally differ from the theorem itself, but in so 
uniform a way that it is superfluous and therefore detrimental to state 
it separately, as Wald seems in several instances to agree, but of four 
theorems he states the mirror images in their full distracting glare of 
italics and special symbols. (v) Chapter 3 is seriously cluttered up 
with parallel treatments of discrete and absolutely continuous cases. 
This old source of inelegant bifurcation in statistics can always, in my 
experience, be unified by assuming simply that all probability measures 
in any given problem are absolutely continuous with respect to some 
one fixed measure of a certain type.*® If that device does not work here, 
it would have been well worth reporting why; if it does work but was 
refrained from because it is relatively advanced, that was bad judg- 
ment, for any reader who had come thus far could take it in his stride. 
I hasten to say that I found no serious mistakes, and that blemishes 
such as these four do not detract a jot from Wald’s achievement. Many 
will think it pedantic to call attention to them at all; but statisticians 
are human, and they understand better if allowance is made for their 
little human weaknesses, such as finiteness of memory, and a tendency 
to attach old meanings to old terms. 

Wald’s report on the current state of the theory of statistical deci- 
sion is of great scholarly value, and its possible influence for the good 
on statistics, through the enthusiastic few who are able to study it, 
is inestimable. 


NOTE ADDED IN PROOF: 


Abraham Wald and his wife died in an airplane accident in India 
on December 13, 1950. The preceding article had gone to press shortly 
before. On hearing of Wald’s death, I thought at first that the article 
ought to be revised in proof or withdrawn altogether, because it would 
be inappropriate in several respects to write the same thing now. But 





8 Namely, a sum of a denumerable number of disjoint. finite measures. 





STATISTICAL DECISION 67 


on further thought, it seems right to let stand verbatim what was 
written when I believed Wald’s great work on statistical decision to be 
but a fraction of what he was about to achieve, adding these few words 
isolated from the paper itself. 

Wald’s death gives Statistical Decision Functions an altogether new 
significance. When it appeared, it was threatened with rapid obsoles- 
cence by the activity of Wald himself. Now progress will be immeasur- 
ably slower. For a long time to come, scholars in the field will turn to 
this book for what is stored up in it of the guidance and stimulation 
which used to flow so copiously from its author. 

A short obituary of Wald by Harold Hotelling, who introduced Wald 
to modern statistical theory, suggested much of his early research, and 
was responsible for his appointment at Columbia, will appear in The 
American Statistician early in 1951. It is planned that the totality of 
his work, which bears on many important topics in statistics in addition 
to these mentioned in my paper, will be reviewed in some detail by Jacob 
Wolfowitz, Wald’s most distinguished student and later his colleague, 
in the Annals of Mathematical Statistics, probably in the first issue of 
1952. 
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THE KOLMOGOROV-SMIRNOV TEST FOR 
GOODNESS OF FIT 


Frank J. Massey, Jr. 
University of Oregon 


The test is based on the maximum difference between an 
empirical and a hypothetical cumulative distribution. Per- 
centage points are tabled, and a lower bound to the power 
function is charted. Confidence limits for a cumulative dis- 
tribution are described. Examples are given. Indications that 
the test is superior to the chi-square test are cited. 


1. INTRODUCTION 


REQUENTLY a statistician is called upon to test some hypothesis 
pee the distribution of a population. If the test is concerned with 
the agreement between the distribution of a set of sample values and 
a theoretical distribution we call it a “test of goodness of fit.” 

Some tests have been developed in which the sampling distribution 
of the test statistic depends explicitly upon the form of, or the value 
of some parameter in, the distribution of the population. For example, 


in the test for normality which uses the g; and g2 statistics (see Sne- 
decor [11] page 176) the distributions of g; and g2 are dependent on the 
form of the population. Similarly, the statistic t=./ N(X—x)/s has 
Student’s distribution only if the population is normal. 

Attempts have been made to find test statistics whose sampling 
distribution does not depend upon either the explicit form of, or the 
value of certain parameters in, the distribution of the population. 
Such tests have been called non-parametric or distribution-free tests. 
Probably the most widely used of such tests is the x? test. 

In this paper an alternative distribution-free test of goodness of fit 
is discussed, and some evidence is presented indicating that when it is 
applicable it may be a better all-around test than the chi-square test. 
Also, a technique for estimating the cumulative distribution of a pop- 
ulation is discussed, including a method of determining the necessary 
sample size for desired precision. Only the case where the cumulative 
distribution of the population is continuous is discussed. This, of 
course, excludes discrete populations. 

The test for goodness of fit described here has been suggested by 
Kolmogorov [3], Smirnov [9], Scheffé [8], and Wolfowitz [14]. The 
limiting distribution of the test-statistic, d, was derived by Kolmo- 
gorov [3] and by Smirnov [9]. Feller [2] and Doob [1] have simplified 
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and unified the proofs. A table of the limiting distribution was given 
by Smirnov [10]. The method of evaluating the distribution of d for 
small samples was given by Massey [5], as was the construction of the 
lower bound to the power function [6]. 


2. THE TEST 


Suppose that a population is thought to have some specified cumu- 
lative frequency distribution function, say F(x). That is, for any speci- 
fied value of z, the value of Fo(z) is the proportion of individuals in 
the population having measurements less than or equal to x. The cumu- 
lative step-function of a random sample of N observations is expected 
to be fairly close to this specified distribution function. If it is not close 
enough, this is evidence that the hypothetical distribution is not the 
correct one. 

If Fo(x) is the population cumulative distribution, and Sy(x) the 
observed cumulative step-function of a sample (i.e., Sv(z) =k/N, where 
k is the number of observations less than or equal to z), then the sam- 
pling distribution of d= maximum | Fo(x) — Sy(z)| is known, and is in- 
dependent of Fo(z) if Fo(x) is continuous. 

Table 1 gives certain critical points of the distribution of d for vari- 
ous sample sizes. For example, at a 0.20 level of significance, the critical 
value of d for N = 10 is 0.322; this means that in 20 per cent of random 
samples of size 10, the maximum absolute deviation between the sam- 
ple cumulative distribution and the population curhulative distribu- 
tion will be at least 0.322. The values in Table 1 for N $35 were com- 
puted by the procedure described in [5]; those for N>35 are from 
Smirnov’s table [10]. The values in the table are believed not to be 
in error by more than 4 units in the last figure shown for N £20, and 
by not more than 0.005 for N = 25, 30, 35. 


3. APPLICATIONS 


Our procedure is to draw the hypothetical cumulative distribution 
function on a graph and to draw curves a distance d,(N) above and be- 
low the hypothetical curve (see Figure 1). If Sy(x) passes outside of 
this band at any point we will reject, at the a level of significance, the 
hypothesis that the true distribution is F(x). Thus, in the example 
shown the hypothetical curve is rejected. Only part of the observed 
distribution has been plotted in Figure 1; if it were plotted completely, 
it would, of course, rise to 1.0 on the vertical scale. Once the observed 
curve passes out of the acceptance band, the theoretical curve is re- 
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TABLE 1. Critical values, d.(N), of the Maximum Absolute Difference 
between Sample and Population Cumulative Distributions. 


Values of da(N) such that Pr{max| Sw(x) —Fo(z)| >da(N)] =a, where Fo(zx) is 
the theoretical cumulative distribution and Sy(z) is an 
observed cumulative distribution for a sample of N. 








Sample Level of significance (a) 
size 





0.15 0.10 0.05 





975 
.842 
.708 
.624 
.565 


oooco 


.521 
-486 
457 


ooooco 
oooco 


0 0. 
0 0. 
0 0. 
0 0. 
0 0. 
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Figure 1. Graphical Method of Applying the d Test. 


The continuous curve represents the theoretical distribution, and the broken 
curves are at distance +d,(N) from it, da(N) being given by Table 1. The step- 
function represents part of the observed distribution. Reject unless the step- 
function lies entirely between the broken curves. 


jected regardless of the later behavior of the observed curve. An alter- 
native, and perhaps simpler, scheme is to record in a table the observed 
and hypothetical distributions and calculate the maximum deviation 
between them. If this exceeds d.(N), we reject the hypothetical distri- 
bution. 

As an example of the application of this test of goodness of fit we 
shall use data given by Snedecor ([11], p. 59). The results of a sam- 
pling experiment are compared with a theoretical normal distribution. 
His cumulative frequencies are recorded in Table 2. 

The maximum deviation in the absolute frequencies, which occurs 
at the boundary score 30.5, is 12.41, which represents a difference in 
the proportions of 12.41/511=0.024. The 5 per cent significance point 
is given in the last row of Table 1 as 1.36/+/511 =0.060. The observed 
value of d is less than the critical value, so we would accept, at the 5 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1951 


TaBLe 2. Comparison of Observed and Theoretical Frequencies in 
Sampling from a Normal Population 
(Snedecor [11], p. 59) 








Cumulative frequency to upper boundary of class 
Upper 


boundary 
of class Observed Theoretical 





Absolute 
difference 





39. 
38 
37. 
36 
35 
34. 
33 
32 
31. 
30. 
29. 
28 
27. 
26. 
25. 
24. 
23. 
22. 
21 
20. 


511 511.00 
510 509.16 
510 506 .45 
505 500 .83 
493 490 .05 
469 471.45 
447 442 .43 
402 401.29 
356 349.22 
300 287 .59 
228 223 .36 
162 161.73 
114 109.66 
73 68.52 
43 39.50 
24 20.90 

14 10.12 
9 4.50 
2 1.79 
2 0.61 
1 0.20 


5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 


_ 
oO 
or 








* Maximum absolute difference = 12.41. 
Hence d = 12.41/511 =0.024. 


per cent level of significance, the hypothesis that the population dis- 
tribution is that recorded in Table 2. 

Grouping observations into intervals tends to lower the value of d. 
For grouped data, therefore, the appropriate significance levels are 
smaller than those tabled. For large samples, grouping usually will 
cause little change in the appropriate significance levels. However, 
grouping into a very small number of categories can cause important 
changes for any sample size. 

As another application, consider testing normality by observing 
whether or not the sample cumulative distribution drawn on arith- 
metic probability paper is approximately straight. There are no theo- 
retical results, at present, which indicate how close to straight the 
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observed sample cumulative curve should be. The d test is correctly 
used only if the distribution is completely specified (i.e., not only as 
normal, but as normal with a specified mean and a specified standard 
deviation). The distribution of the maximum deviation is not known 
when certain parameters of the population have been estimated from 
the sample. It may be expected, however, that the effect of adjusting 
the population mean and standard deviation to those of the sample, 
either by calculation or by visually fitting a straight line on normal 
probability paper, will be to reduce the critical level of d. If the value 
of d.(N) shown in Table 1 is exceeded in these circumstances, we may 
safely conclude that the discrepancy is significant, i.e., that the distri- 
bution is not normal.! 


4. POWER OF THE TEST 


Suppose we indicate by F;(z) an alternative form of the distribution 
function. Let A be the maximum absolute difference between F(x) 
and F,(x). This measurement of distance between alternatives has been 
used by Mann and Wald [4] and by Williams [12]. 

For large samples it has been shown [6] that the power of the d test 
(i.e., the probability of rejecting the hypothetical distribution) is never 
less than 


2[AVN+dg(N)] 
1 — (2n)-12 f exp (— t*/2)dt. (1) 
2(AVN—da(N)) 


Since this is a poor lower bound to the power, the actsal power is likely 
to be much larger. As is shown in Section 5, however, it is of value in 
comparing the d test with the x? test. Figure 2 shows this lower bound 
for the 5 and 1 per cent levels of significance. 

Figure 2 can be used to indicate the sample size necessary so that, 
at the 5 per cent level of significance, the d test of Fo(z) has power at 
least 0.50 against the alternative F(z). Suppose that the maximum 
absolute difference between F(x) and F(z) is 0.2. Reading across from 
0.50 on the vertical scale we see that the a=0.05 curve is intersected 





1 A sampling experiment was conducted by the writer in which 100 samples of size 10 were drawn 
from a known normal distribution. The cumulative distribution for each sample was plotted on urith- 
metic normal probability paper and a straight liue was fitted by eye. The observed percentiles of the 
distribution of d were considerably lower than those given by Table 1. The 95th percentile was 0.29 as 
compared with 0.41 in Table 1, and the 90th percentile was 0.25 as compared with 0.37. This implies 
that deviations greater than those in Table 1 should, in these applications, be treated as very strong 
indication of departure from normality. In the sampling experiment only 1 observation of the 100 ex- 
ceeded the 20 per cent critical value from Table 1 and no observation exceeded the 10 per cent critical 
value, 
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above A+/N = 1.36. Solving this for N we find N = (1.36/0.2)?= 46.24. 
A sample of 47 would, therefore, be required. 

As another example, suppose we have a sample of 400 and that 
A=0.10. If we test the hypothetical distribution F(z) at the 1 per cent 
level of significance we can determine a lower bound to the chance of 
rejecting it if the true distribution is F,(z). Here A\/N =0.10 X20=2, 
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Ficure 2. Lower bounds for the power of the d test for a=0.01 and a=0.05. 
































and above 2 on the horizontal scale we see that the 1 per cent curve is 
at a height of 0.77. The power of this test is, thus, at least 0.77 against 
the particular alternative F(x); i.e., if F(x) is correct, we have at 
least a 77 per cent chance of detecting that F(x) is incorrect. 


5. COMPARISON OF THE d TEST WITH THE x? TEST 


Mann and Wald [4] have given a technique for deciding on an op- 
timum number of class intervals for the application of the x? test for 
goodness of fit. The intervals and sample size are so chosen that the 
probability of rejecting Fo(x) as the true distribution, if Fi(z) is ac- 
tually the true distribution, is never less than 0.5. More important for 
our purposes is the fact that there will be one alternative distribution, 
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F,(x), at a distance A from F(z), such that for the x? test the probabil- 
ity of rejecting Fo(x), if F2(x) is the true distribution function, is as 
close as desired to 0.5. 

Williams [12] has presented a table showing, for various sample 
sizes, minimum distances for which the power of the x? test is not less 
than 0.5. Part of his table is reproduced in Table 3, together with mini- 
mum distances for which the d test has power not less than 0.5. The 
discrepancies detectable by the d test are all smaller than those for 
the x? test. This implies that the d test, at least at the 50 per cent power 
level, will detect smaller deviations in cumulative distributions than 
will the x? test. 


TaBLe 3. Minimum Deviation of Actual from Assumed Population that 
is Detectable with Probability 0.50 by the x? and d Tests at 
the 5 per cent and 1 per cent Levels of Significance* 








a= .05 a=.01 





x? d test x? test d test 





. 1847 0.115 
. 1657 0.103 
.1577 0.094 
.1479 0.087 
. 1369 0.082 
-1315 0.077 
.1273 0.073 
.1209 0.070 
1184 0.067 
1137 0.064 
1120 0.062 
-1083 0.060 


.1605 0.096 
. 1469 0.086 
-1343 0.079 
. 1284 0.073 
.1213 0.068 
-1157 0.064 
.1112 0.061 
-1052 0.058 
. 1024 0.055 
. 1000 0.053 
0.0961 0.051 
0.0945 0.050 
0.0914 0.048 -1051 0.058 
850 0.0887 0.047 -1022 0.056 
900 0.0877 0.045 0.0997 0.054 
950 0.0855 0.044 0.0974 0.053 
1000 0.0834 0.043 0.0953 0.052 
1100 0.0812 0.041 0.0918 0.049 
1200 0.0782 0.039 0.0888 0.047 
1300 0.0757 0.038 0.0862 0.045 
1400 0.0734 0.036 0.0841 0.044 
1500 0.0715 0.035 0.0823 0.042 
2000 0.0629 0.030 0.0728 0.036 


ooooocooooco 


ooooooocococooco 

















* The deviation between two populations is measured by the maximum absolute difference between 
their cumulative distributions. The values for the x° test are taken from [12]; those for the d test are 
computed from formula (1). 
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Other points of comparisons between the x? and d tests may be noted: 


(z) In general, the power of the x? test isnot known (Mann and Wald 
[4] considered only the case where it is 0.5), whereas a lower 
bound to the power of the d test for any alternative can be read 
from Figure 2. 

(ti) The d test treats individual observations separately and thus does 
not lose information by grouping, as the x? test necessarily does, 
In small samples this loss of information in x? procedures is large, 
since wide class intervals must be used; and for very small sam- 
ples x? is not applicable at all. This, together with the information 
in Table 3, suggests that the d test may be always more powerful 
than x? tests. 

(tit) d will usually require less computation than x?. This is especially 
true when a graphical test is used, as illustrated in Figure 1, for 
if the hypothesis is rejected the computation stops at the point 
of rejection. Graphing might be convenient if a standard hypothe- 
sis is tested repeatedly, since a master test chart could be pre- 
pared. There are also instances where individuals can be ranked 
easily according to size, and then the individuals can be measured 
one at a time starting with the smallest. After each individual is 
measured, the cumulative distribution can he checked to see if d 
exceeds d.(N). Using this sequential procedure it might be pos- 
sible to avoid the actual measurement of many of the individuals. 
This might be especially useful if the ranking technique were 
fast and inexpensive while actually measuring was slow and ex- 
pensive. 

(tv) In cases where parameters must be estimated from the sample 
the x? test is easily modified by reducing the number of degrees 
of freedom. The d test has no such known modifications so, except 
for the remarks in Section 3, is not applicable in such cases. 

(v) As yet the d test cannot be applied to discrete populations, where- 
as the x? can be. 





6. CONFIDENCE LIMITS FOR THE TRUE CUMULATIVE 
DISTRIBUTION FUNCTION? 


Table 1 can be used to find confidence limits for the true cumulative 
distribution function, say F(z). Thus 100 (1—a) per cent confidence 
limits for F(z) are 


Sw(xz) — da(N) < F(z) < Sy(x) + d.(N). 


2 See reference [13]. 
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For example, for a sample of size 100, one can be 95 per cent sure 
that Sw(z) will stay within 1.36/+/100 =0.136 of the true distribution. 
As another example, suppose it is desired to perform a large scale 
(“Monte Carlo” method [7]) sampling experiment to study some dis- 
tribution. To be 99 per cent sure of estimating the cumulative sam- 
pling distribution within, say, 2 percentage points for the entire curve, 


TaBLeE 4. Confidence Limits for the Cumulative 
Distribution Function Shown in Table 2 











Upper i Observed Upper 
Boundary Contiienin Cumulative Confidence 

of Class Proportion Limit 
39.5 0.940 1.000 1.000 
38.5 0.938 0.998 1.000 
37.5 0.938 0.998 1.000 
36.5 0.928 0.988 1.000 
35.5 0.905 0.965 1.000 
34.5 0.858 0.918 0.978 
33.5 0.815 0.875 0.935 
32.5 0.727 0.787 0.847 
31.5 0.637 0.697 0.757 
30.5 0.527 0.587 0.647 
29.5 0.386 0.446 0.506 
28.5 0.257 0.317 0.377 
27.5 0.163 0.223 0.283 
26.5 0.083 0.143 0.203 
25.5 0.024 0.084 0.144 
24.5 0 0.047 0.107 
23.5 0 0.027 0.087 
22.5 0 0.018 0.078 
21.5 0 0.004 0.064 
20.5 0 0.004 0.064 
19.5 0 0.002 0.062 
18.5 0 0 0.060 





the necessary sample size is found as follows: From Table 1, we find 
do. = 1.63/+/N =0.02. Hence »/N =81.5 and N = 6643. 

As a final example, consider the data in Table 2. Ninety-five per cent 
confidence limits for the true distribution curve are obtained by add- 
ing and subtracting 1.36/+/511=0.060 from the observed distribu- 
tion as shown in Table 4. Note that the theoretical cumulative dis- 
tribution, recorded in Table 2, is completely inside the limits and thus 
would be accepted at the 5 per cent level of significance. 
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A LARGE SAMPLE t-STATISTIC WHICH IS INSENSI- 
TIVE TO NON-RANDOMNESS 


Joun E. Watsa 
The Rand Corporation 


population mean are based on the assumption of a random sam- 
ple. This paper considers how the significance levels and confidence 
coefficients of the commonly used class of tests and intervals based on 
the standard Student t-statistic are changed when the random sample 
requirement is violated and the number of observations is large. It is 
found that even a slight deviation from randomness can result in a 
subsiantial change in significance level and confidence coefficient. This 
class of tests and confidence intervals thus seems to be of questionable 
practical value for large sets of observations. Large sample tests and 
confidence intervals for the mean which are not sensitive to the ran- 
domness requirement are obtained for a situation of practical interest 
by development of a special type of t-statistic. For the case of a random 
sample, these tests are as efficient (asymptotically) as those based on 
the standard t-statistic. 

Introduction and Statement of Results. In deriving statistical tests 
and confidence intervals, certain assumptions are made. When these 
tests and intervals are applied to practical situations, their validity 
depends upon how closely the assumptions are approximated. One as- 
sumption frequently made is that a set of observations is a random 
sample; i.e., that the observations are 


M*o well-known significance tests and confidence intervals for the 


(a) Statistically independent 
(b) From the same population (i.e., have the same univariate dis- 
tribution function). 


One of the principal purposes of this paper is to study how sensitive 
a commonly applied class of large sample tests and confidence intervals 
for the population mean is to violation of one or both of assumptions 
(a) and (b). 

Let the values of the n observations used for a test or confidence in- 
terval be denoted by 2, ---, 2». The problem considered is that of 
obtaining significance tests and confidence intervals for the expected 
value of the sample mean #(=), 2;/n). This expected value will be 
denoted by u;i.e., H(Z) =n. 

The commonly used class of tests and confidence intervals for » in- 
vestigated here consists of those based on the quantity 
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(1) esa / / Fn -ae =». 





If the n observations are a random sample from a population for which 
the first two moments exist (almost ail populations used to approximate 
practical situations satisfy this property), the distribution of (1) is 
nearly normal with zero mean and unit variance for n sufficiently large. 
The quantity (1) is the well known Student t-statistic; if the observa- 
tions were a random sample from a normal population, this quantity 
would have a Student ¢-distribution with n—1 degrees of freedom 
(n22). 

For large values of n, it is found that the distribution of (1) is very 
sensitive to assumptions (a) and (b). For example, let us consider vio- 
lation of (a). The introduction of an average correlation as small as 
.001, or even .0001, can result in substantial changes in the values of 
significance levels and confidence coefficients for the class of tests and 
intervals based on (1). It is possible that such correlations exist for 
many practical situations where a random sample is assumed. Viola- 
tion of (b) can also cause trouble. For example, suppose that the ob- 
servations do not all have the same expected value; i.e., E(2;) =ui, 
(i=1, +--+, n), and n»=)> u;/n. Then the significance levels and con- 
fidence coefficients of tests and intervals based on (1) can differ no- 
ticeably from their hypothetical values if >> (u;—x)? is not sufficiently 
small. It is likely that a random sample is assumed for practical situa- 
tions where this quantity is not sufficiently small. 

Due to the great variety of ways in which (a) and (b) can be vio- 
lated, development of a general class of large sample tests and intervals 
for » which are not sensitive to the random sample requirement would 
seem very difficult if not impossible. There is, however, one type of 
situation which is of practical interest and for which suitable large 
sample results can often be developed. This type of situation is defined 
by several conditions which are of a technical nature. The procedure 
followed will be first to state the conditions and then to present the 
practical motivation behind their choice. For this case it is assumed 
that the n observations can be divided into m(=1) subsets meeting the 
following conditions: 


(i) Consider all possible pairings of the n observations. The sum of 
the covariances for all the pairs approximately equals the sum of 
the covariances for those pairs where both observations are from 
the same subset. 
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(ii) For each subset, consider all possible pairings of the observations 
in that subset. A group of these pairs can be chosen so that 
1. The sum of the covariances for all the pairs of the subset ap- 
proximately equals the sum of the covariances for the pairs in 
the group. 

. For each pair of the group, consider the difference of the ex- 
pected values of the observations of this pair. The sum of the 
squares of the differences of these expected values is very much 
less than the sum of the variances for the observations of the 
subset. , 

. In this group each observation of the subset occurs the same 
number of times r, and the value of r is the same for all subsets. 
The total number of pairs in all such groups (one group for each 
subset) has an order of magnitude which does not exceed that 
of n (say, S5n). 

(iii) The group of (ii) can be increased by adding more pairs to form 
an augmented group with the properties 
1. The sum of the variances for the observations of the subset is 
very much greater than (r+1)/(s—r) times the sum of the 
squares of the differences of the expected values for the addi- 
tional pairs used to increase the group of (ii) to the augmented 
group. 

. In the augmented group each observation of the subset occurs 
the same number of times s, where s has the same value for all 
subsets and is greater than r. 

. The total number of pairs in all the augmented groups has an 
order of magnitude which does not exceed that of n (say, $10n). 


In many situations there will be several choices for the group of (ii) 
and the augmented group of (iii). For these cases, it would seem 
preferable to choose the group and augmented group so that s—r is 
maximum. Also, if several values are available for m, selecting m as 
small as possible would seem to'be most appropriate. 

First let us consider the practical motivation behind the selection of 
condition (i). Frequently a set of observations can be divided into sub- 
sets for which it is evident that the average correlation within subsets 
is very much greater than the average correlation between subsets. 
Then, if the subsets are not too small and the variances of the observa- 
tions do not differ greatly, condition (i) will usually be satisfied. 

Next let us consider the practical motivation for parts 1 and 2 of (ii). 
The intuitive reasoning used to decide how the correlation of one pair 
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of observations of a subset compares with that of another pair ordi- 
narily consists in noting how “near” the observations are for each pair, 
A subgroup consisting of pairs which are “near” would be taken to 
have a much higher average correlation than a subgroup consisting of 
pairs which are “distant.” This same reasoning holds with regard to 
expected values. The expected values for a pair of “near” observations 
often tend to be very nearly equal. If, in addition, the subsets are 
chosen so that the expected values of observations within a subset do 
not differ much, the expected values of “near” observations can fre- 
quently be considered equal for all practical purposes. In condition 
(ii), the requirement of being “equal for all practical purposes” is re- 
placed by the weaker requirement stated in part 2; this weaker re- 
striction is sufficient for the mathematical derivations. Some specific 
illustrations of what is meant by “near” and “distant” are furnished by 
the industrial and agricultural examples presented below. 

Part 3 of (ii) can often be satisfied by ordering the observations of a 
group according to how “near” the observations of a pair are; let this 
ordering begin with the “nearest” pairs. The first step is to choose a 
subgroup of the pairs which satisfies parts 1 and 2 of (ii). This is done 
for each subset. By adding further pairs which are near the beginning of 
the orderings for the corresponding subsets, the subgroups can fre- 
quently be extended into groups which satisfy the first three parts 
of (ii). 

Part 4 of (ii) represents one of the principal reasons for dividing the 
observations into subsets. This condition is important in the deriva- 
tions. If m is not too small, approximate verification of this part of (ii) 
is usually not difficult. As an example of the method used, suppose that 
n= 1600, m=20 and each subset contains 80 observations. Then there 
are 3200 possible pairings of observations for each subset. If, for each 
subset, a group can be obtained which satisfies the first three parts of 
(ii) and does not contain more than 400 pairs, then the total number 
of pairs in all groups is at most 8000. Since 8000 <5n for this case, 
part 4 of (ii) is satisfied. 

The upper limit of 5n in part 4 was chosen for the sake of definiteness. 
In practice, one never has an unlimited number of observations and it 
is necessary to decide what is to be meant by “the same order of mag- 
nitude as n.” The upper limit selected should be suitable for many 
applications. 

The augmented groups of (iii) are required in the derivations. The 
reasons for believing that (iii) may be satisfied in practical situations 
are similar to the reasons presented for (ii). The problem is to increase 
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the size of the groups of pairs found in (ii) in such a manner that (iii) 
is satisfied. This can often be done by supplementing the groups of (ii) 
with additional pairs near the beginning of the orderings described in 
(ii). The first part of (iii) is merely another weakened form of the 
property “equal for all practical purposes” discussed when considering 
the motivation for (ii). Part 3 is needed in the derivations; the upper 
limit of 10n is based on the same considerations as the upper limit of 
5n in part 4 of (ii). 

Before considering examples of situations where (i)—(iii) appear likely 
to be satisfied, let us discuss what these examples represent. The de- 
cision as to when (i)—(iii) are satisfied is purely a matter of judgement. 
The validity of these conditions depends on the particular situation 
considered. The decision to accept or reject (i)—(iii) is made on the basis 
of the intuition and past experience of the persons concerned with the 
experiment. The purpose of the examples is merely to call attention to 
situations where there is a strong possibility that (i)—(iii) are accepta- 
ble. For this reason the examples will be liberally sprinkled with phrases 
of the type: “it seems reasonable to assume,” “it appears likely,” etc. 

Let us consider an industrial example of a situation where conditions 
(i)-(iii) would appear to be satisfied. Here the observations consist of 
measurements of some specified characteristic of an item which is pro- 
duced by a machine. This machine is operated for an eight-hour shift 
five days a week. Since the machine is inoperative for at least sixteen 
hours between any two days, it seems reasonable to assume that items 
produced during the same day are much more highly correlated than 
those produced on different days. Thus, if the observations are sub- 
divided according to the day produced, condition (i) would appear to 
be satisfied. Let the observations produced during a day be ordered 
according to the time produced. Then it seems likely that nearby ob- 
servations in this ordering will be much more highly correlated than 
distant ones; alsc the expected values of nearby observations will be 
very nearly equal. Since eight hours is a relatively short period of time 
and conditions tend to remain constant during a shift, the expected 
values of observations produced on the same day should not differ 
much. Gathering these properties together, it seems likely that a group 
satisfying (ii) and an augmented group satisfying (iii) can be found for 
this situation. Many examples of situations similar to this one can be 
obtained from business and industry. 

Now consider an agricultural example. Let part of the observations 
come from one locality, part from another locality, etc. Subdivide the 
observations according to the localities from which they were obtained. 
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Then it is frequently permissible to assume that the average correlation 
within localities is very much greater than the average correlation be- 
tween localities. Hence condition (i) will likely be satisfied. Consider 
the plants grown in a specified locality. It seems reasonable to assume 
that plants situated near each other on the plot of ground are much 
more highly correlated than those which are far apart; also, if the 
plants are of the same type, the expected values of nearby observations 
should be very nearly equal. Moreover, if the plants are grown on a 
plot of ground which is of a homogeneous nature with respect to fer- 
tility, weather conditions, irrigation, etc., the expected values of the 
observations from this locality should not differ much. Thus, if the 
number of localities is not too small, it seems likely that conditions 
(i)—(iii) can often be satisfied for situations of this type. 

In both examples, nearness in time or distance was assumed to imply 
high correlation and almost equal expected values. There are cases 
where this is not true. Thus the situation considered should be carefully 
examined before this assumption is adopted. 

Suppose that the observations have been divided into subsets for 
which conditions (i)-(iii) are satisfied. Let m denote the number of 
subsets and consider the kth subset (k=1, - - -, m). For each pair of 
the group specified by (ii), subtract the value of one observation from 
that of the other and square the value of the resulting difference. The 
sum of the squares of these differences will be denoted by S,(k). As an 
illustrative example, let the kth subset consist of the observed values 
4, 3, 3, 5, 7 while the group of (ii) consists of the pairs (4, 3), (3, 3), 
(3, 5), (5, 7), (7, 4). Then r=2 and S,(k) =18. It is to be emphasized 
that in forming pairs the pair (z;, z;) is not to be considered different 
from the pair (z;, z;). Next perform the same operation of taking dif- 
ferences and squaring for all the pairs of the augmented group of (iii). 
Denote the sum of the squares of these differences by S2(k). The tests 
and confidence intervals developed for u are based on the quantity 





(2) Jule = 5) / 4/ Tos Si(k) + BD: S2(k) |, 


where 


A = — (s+ 1)/(s — r)n, B= (r + 1)/(s — r)n. 


If n is sufficiently large, the distribution of (2) is approximately normal 
with zero mean and variance nearly equal to unity for many situations 
of practical interest. 

Under some restrictions which are usually not of practical impor- 
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tance, it can be shown that the asymptotic distribution of (2) is normal 
with zero mean and unit variance when the n observations are a random 
sample. Thus the results based on (2) have the same efficiency (asymp- 
totically) as the corresponding tests and confidence intervals based on 
(1) for the case of a random sample. However, the results based on (2) 
are also valid when conditions (i)—(iii) are satisfied. 

What is meant by the statement “n is sufficiently large” as used in 
this section is difficult to specify in general. However, n21000 would 
seem to be satisfactory for many situations. Also it should be empha- 
sized that the results presented apply to discrete as well as continuous 
variables. 

The approach used in obtaining (2) is similar to that used in stratified 
cluster sampling design for estimating variances. 

Investigation of Specified Class. This section and the next contain 
proof of the results stated in the preceding two sections. In these deriva- 
tions, presentations of a precise technical nature will be avoided. In- 
stead, the principal effort will be guided toward making the method of 
attack clear by avoiding technical detail. For example, a statement of 
the type “Imposing some weak restrictions on the behavior of the fourth 
and lower order moments .. .” will be used in place of an exact state- 
ment of the conditions which are imposed on the fourth and lower order 
moments. However, the derivations will be presented in sufficient de- 
tail to enable a mathematical statistician to reconstruct their principal 
features. 

Now let us investigate how sensitive the tests and confidence inter- 
vals based on (1) and the assumption of a random sample are to slight 
violation of condition (a). Denote the mean of 2; by ui, its variance by 
s;7, and the correlation between 2; and 2; by p;;(t~j=1, - - -, n); then 


L= Yai/n. 


Imposing some practically unimportant restrictions on the behavior 
(as n increases) of the fourth and lower order moments (mixed or other- 
wise) of the multivariate population from which x, - - - , z, isa sample 
value, the variance of 


(3) Y (x — 2)3/(n — 1) 


tends to zero as n approaches infinity. The expected value of this 
quantity equals 


1 n 1 n n 
(4) “7 u a2? —— D> piyoios/(n — 1) + DS (ws — w)*/(n — 1). 


t# j=l 1 
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It then follows from Tchebycheff’s Inequality that (3) converges in 
probability to the value of (4). 

For many situations of practical interest, the asymptotic distribution 
of 


(5) vate — 2) /4/— Zot +—E pu: 


ML igsj 





is standard normal. In the derivations it will be assumed that this is the 
case. Then on the basis of the convergence theorem [1], asymptotically 
the distribution of 


1/2 
Lo? — p > pijos03;/(n — 1) + —— D (us — w)? | Vale - vw) 
— Dd (ai — 4)? 


=. o;? + _ PijO5Oj 


inj (n — 1) 








has zero mean and unit variance. Hence, for large n, the variance of (1) 
differs from its hypothetical value of unity by the factor 


[1 + (n — 1)¢]/(1 — $ + 9), 
where 


@ = Dd pijows/(n — 1) D0 oo? 


6 = n>) (us — u)?/(m — 1) D0 a. 


The range of permissible values for ¢ is —1/(n—1) to 1. If the o; are 
equal, ¢ represents the average correlation among the observations; i.e., 


@ = Dd pij/n(n — 1). 


i#j 


Let us consider a class of one-sided confidence intervals obtained for 
uw by use of (1) under the assumption of a random sample. Let Kp 
denote the standardized norma] deviate (zero mean, unit variance) 
exceeded with probability P. The one-sided confidence interval 





(6) (z + KV> (x; sis z)?/n(n all 1), 20 ) 


is then assumed to have confidence coefficient ¢; i.e., 





Pr [@ + K.V >> (x; — %)?/n(n — 1) <u] =e. 
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Actually 
Pr [8+ K.V 2) (s—2)*/n(n—1) <u] 

= Pr{#+K.V [1+(n—1)¢] 2) (ai:—2)*/n(n—1)(1—$+ 0) <u} 


= ad, 








where @ is defined by the relation 
K. = K.V(1~ ¢+8)/[1 + (n — 14]. 


If the observations are a random sample, ¢=0, @=0 and a=e. If one 
or both of ¢ and @ differ from zero, however, the value of a can deviate 
noticeably from ¢. For example, let «=.05, n=10,000, 6=0 and 
2.001. Then a2.31. If 6S —.00008 for this case, a<=.00012. Thus 
very slight deviations of ¢ from zero can result in substantial deviations 
of the true confidence coefficient of (6) from its hypothetical value. The 
effect of 6 is not as great as that of ¢ and often tends to cancel the effect 
of ¢. However, if the values of the yu; differ noticeably, a can differ 
greatly from ¢ on the basis of 6 alone. Analogous considerations apply 
to other types of one-sided confidence intervals and to two-sided con- 
fidence intervals; also to significance tests based on these confidence 
intervals. 

Derivations for (2). Here it will be shown that the asymptotic dis- 
tribution of (2) is norma] with zero mean and variance approximately 
equal to unity if conditions (i)—(iii) and certain mild restrictions are 
satisfied. 

With some weak restrictions on how the fourth and lower order 
moments of the multivariate population from which the sample value 





t1, °° *, 2, was drawn behave as n increases, the variance of 
(7) AD Silk) + BLE S2(k) 
1 1 


tends to zero as n approaches infinity. Part 4 of (ii) and part 3 of (iii) 
are used in obtaining this result. 

From (i), parts 1 and 3 of (ii), part 2 of (iii), the expected value of (7) 
is approximately equal to 


1 l m m 
> Dd 02 + — Dd pijoioy + AD m2(k) + BDO m22(k). 
ij 1 1 


Here y7(k) equals the sum of the squares of the differences of the ex- 
pected values of the pairs for the group of the kth subset specified by 
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(ii). The quantity u2(k) equals the corresponding sum of squares for 
the augmented group specified by (iii). However, 


1 
AD mi(k) + BD m%(k) = — = Dd m°(k) 
r+1 
+ ——— > [m:?(k) — mit(k)]. 
(s —r)n 
Thus, on the basis of the third part of (ij) and the second part of (iii), 
the expected value of (7) is approximately equal to 


1 : 3 g 
(8) = Dot + — DL pizoia;. 
1 E 


N ipjml 


As the expected value of (7) is near (8), this expected value is positive 
for almost any situation of practical interest. Consequently, from 
Tchebycheff’s Inequality, Pr[(7) <0]—+0 as n— ©. From this it follows 
that the expected value of the absolute value of (7) is approximately 
equal to (8) for large n and the variance of the absolute value of (7) also 
tends to zero as n—. Since the variance of 1/n(#—) has the value 
(8), the properties stated for (2) now follow from Tchebycheff’s In- 
equality combined with the convergence theorem [1] and the asymp- 
totic normality of ./n(#— 2). 
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[1] Harald Cramér, Mathematical Methods of Statistics, Princeton Univ. Press, 
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A SHORT-CUT MEASURE OF CORRELATION 


WitiiaM A. SPURR 
Stanford University 


EGINNING with a scatter diagram and regression curve, this method 
provides a quick graphic estimate of the coefficient or index of 
simple correlation. This is done by measuring the ranges about the 
regression curve and about the mean of the dependent variable, re- 
spectively, that include two-thirds of the items. The ratio of these 
values is approximately the coefficient of alienation, from which the 
coefficient or index of correlation may be looked up in a table. 
This method appears to be reasonably accurate and quicker than 
other short-cut methods. It may also be used to illustrate the meaning 
of correlation for teaching purposes.' 


Tue METHOD 


The procedure is as follows: First, plot a large-scale scatter diagram 
and a free-hand regression curve for the two variables to be correlated.” 

Second, draw two lines parallel to the regression line, as shown in 
Figure 1, so that one-sixth of the dots fall above and one-sixth below 
this band. Thus, if there are 20 points, the line may be drawn through ~ 
either the third or fourth dot from the top and bottom, measured to- 
ward the regression curve, provided the same number of dots is used in 
Step 3. This may be done with a transparent ruler or parallel rules set 
along the regression line; or, in the case of a curved line, by tracing the 
curve and the Y axis on a transparent sheet, and moving this sheet up 
and down along the Y axis until one-sixth of the items are excluded on 
either side. 

The vertical width of this band may now be measured on the Y axis. 
This value is roughly twice the standard error of estimate, 2S,’, (the 
prime indicating the graphic approximation in all symbols), since a 
range of 1S, above and below the regression line includes about two- 
thirds of the items in a normal distribution. 





1 The writer is indebted to Professor Holbrook Working of Stanford University for suggestions on 
this method. 

2 The subjective error in drawing a free-hand curve may be reduced by first plotting the averages of 
various groups of points as guides. (See Mordecai Ezekiel, Methods of Correlation Analysis, Second Edi- 
tion, New York: Wiley, 1941, pp. 105-110.) The free-hand curve is especially useful in cases where ex- 
treme values are present, where the data do not follow any simple mathematical curve, where time is 
limited, or where only approximate results are desired, as in this method. In case an objective fit is 
desired for data related by a simple mathematical function, with relatively normally distributed residu- 
als, the least squares solution is of course preferable. 

As in correlation generally, either variable may first be transformed into a logarithm, reciprocal or 
other logical function to produce linearity. 
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If gaps occur in the data near either of the points marked, the band 
may be drawn to exclude a tenth or some other fraction of the dots on 
either side. In any case, the same number of points must fall outside 
the horizontal band in Step 3.’ 

Third, set the ruler on the scatter diagram horizontally and mark two 
straight lines separating off the top sixth of the items and the bottom 
sixth, as in Figure 1. Measure this spread, too, against the vertical 
scale of the chart. This is 2c,’, roughly twice the standard deviation of 
the dependent variable, since a range of 1c, above and below the mean 
of the Y values includes about two-thirds of the items in a normal 
distribution. 

Finally, divide the band-width of Step 2 by that of Step 3. This ratio, 
S,’/o,’, is a positional coefficient of alienation, k’. With this value, look 
up the coefficient or index of correlation r’ in a table of the function 
r=/1—F or k=+/1—r? (since the r and k are reversible), such as 
Dunlap and Kurtz’s Handbook of Statistical Nomographs, Tables and 
Formulas (World Book Co., 1932), pp. 84-87. If a table is lacking, 
compute r from the formula r= +/1—(S,/o,)?. 

In Figure 1, for example, 2S,’=5.07 feet and 2c,’=8.00 feet by 
measurement on the Y axis; their ratio k’=.634, and from the table, 
r’=.77 (compared with .76 for the product-moment r). And that’s all 
there is to it. 

This method may be useful when: (1) a short-cut is needed; (2) an 
approximate result is satisfactory, as in determining whether a given 
relationship justifies mathematical analysis, or as a check on machine 
computation; (3) the basic data are themselves rough, irregular in dis- 
tribution or marred by extreme values, so that more elaborate methods 
do not seem justified ; (4) a teaching device is needed that will show the 
student the meaning of correlation more clearly through visual demon- 
stration, rather than through abstract formulas. 

This method is believed to be generally simpler and quicker than 
such other short-cut measures of correlation as Davies’ first moment 
coefficient, Jenkins’ graphic and decile devices’ or the method of 
averaging geometrically the slopes of two regression lines.* This is be- 





3 The total deviation msy be used instead of a positional range for greater accuracy, by cumulating 
the vertical deviations of the dots from the regression curve and from the mean of the Y values ‘in 
Step 3), respectively, on the edge of a paper strip, and then dividing these two totals in Step 4 to ob- 
tain the coefficient of alienation. 

4 This Journal, December, 1930, pp. 413-427. 

5 Educational and Psychological Measurement, Winter, 1945, pp. 437-443, and Winter, 1946, pp. 
533-536. 

® Ezekiel, op. cit., pp. 140-141. 
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cause of the fewer and easier steps involved. In a test described below, 
the writer averaged six minutes apiece in computing r’s by this method 
from sixteen available scatter diagrams. 


Height 
Growth 
in Feet 


18 




















i i 1 l 


3 4 5 3 


Diameter Growth at Breast Height, in Inches 








Figure 1. Graphic Correlation Example. 
Diameter and Height of Twenty Trees 
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ACCURACY OF RESULTS 


Granting the simplicity of the method, it still looks rather crude. 
The question remains, therefore: How accurate is it? This point is dis- 
cussed below in four parts. 

1. In the first place, the coefficient of correlation is a function of the 
coefficient of alienation S,/c,, which is merely the ratio of the scatter 
about the regression curve to the total scatter of the Y values. While 
the standard deviation or standard error is theoretically the best meas- 
ure of scatter in an ideal distribution, there are of course alternative 
measures of dispersion suitable for varying circumstances, such as the 
interquartile range, similar to the intersextile range used here. These 
positional measures have certain advantages of their own,’ such as 
simplicity and occasionally even greater accuracy than the standard 
deviation if the latter is warped by extreme values. In a normal dis- 
tribution a positional range will give the same result as the standard 
deviation, subject to a somewhat greater sampling error. Too often in 
correlation, however, the rigid Pearsonian calculations are followed 
dogmatically when the reliability of the raw data or the accuracy re- 
quired in the results justifies neither the labor nor the underlying as- 
sumptions of this method. 

2. The discrepancy between the graphic estimate and the computed 
value of S,/c, has little effect on r when r is high, but much effect when 
it is low, because of the nature of their functional relationship. As 
shown in Figure 2, a large error in S,/o, at one end of the scale, from 
0.0 to 0.1, would change r only from 1.000 to .995; whereas the same 
shift in S,/o, from 0.9 to 1.0 at the other end, would cause r to plunge 
from .436 to .000. The graphic error (r—r’) is therefore smallest for 
high values of r, and indeed approaches zero as r approaches 1. 

As r drops, however, in a sample of given size, both the graphic dis- 
crepancy and the standard error of the computed r increase part passu 
(since both follow the same function 1—r?). The graphic error is there- 
fore about the same for low values of r as for high values in terms of the 
standard error in r. 

The apparent precision in the graphic method for high values of r 
really reflects a defect in r itself. The value r exaggerates the degree of 
correlation present, in the writer’s opinion, thus leading the unwary 
researcher astray. A better measure would be r? which shows the more 
meaningful ratio of explained to total variance, or even the still more 





1 For further discussion, see Frederick Mosteller, “On Some Uaeful Inefficient Statistics,” Annals 
of Mathematical Statistics, Dec. 1946, pp. 377-408. 
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conservative “prediction index”® (P. I.), which is simply the straight 
line function 1—S,/oy. 

3. The sampling error in this method may be reduced in a normal 
distribution by using the convenient 10-90 percentile range (which is 
nearly as reliable as the optimum 7-93 percentile range) or, better still, 
the graphic average deviation® described in Footnote 3. The “inter- 


nr? PT. 
10 

















i 1 i i Sy 
0 2 r7 6 6 1.0 ry 
Fiaure 2. Three Functions of the Alienation Coefficient S,/cy. 


sextile” range is suggested here instead because of its simplicity, its 
equivalence to 2c (which is useful as a teaching device), and its superior 
reliability over the more commonly used interquartile range. The pre- 
cise comparison of sampling errors is not appropriate, however, since 
the graphic procedure involves the additional subjective error of draw- 
ing the regression curve free-hand and the minor error of chart reading. 
The point is that frequently a short-cut method is needed at the expense 
of some accuracy, or a larger sample can be used with a simpler pro- 





® Alan E. Treloar, Elements of Statistical Reasoning (New York: Wiley, 1939), pp. 125-127. 
® See Truman Kelley, Fundamentals of Statistics (Harvard University Press, 1947), pp. 230-233 
for a comparison of sampling errors. 
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cedure without loss of accuracy, so that the “most efficient” method 
is not necessarily the best in practice. 

4. The writer tested this method on sixteen text-book scatter dia- 
grams, (averaging only 3}” in width), including eleven linear and five 
curvilinear regressions. The regression curves were drawn free-hand, 
where needed, without benefit of group averages. The average time 
required to draw the curves and to compute r’ or p’ was six minutes; 
the range, four to ten minutes. The graphic sextile deviation differed 
by 8% from the standard error of estimate, on the average, but the 
difference in r or p averaged only .02. The errors were least for high 
values of r and for larger samples. 

The graphic and mathematical values of r and p were converted to 
2’’s (Fisher’s transformation of r with a nearly normal distribution), and 
the discrepancy (z—z’) compared with the sampling error of z, where 
o,=1/\/N—3. In the majority of cases the discrepancy was less than 
half the standard error of the computed z. In only one case (a curvi- 
linear regression) did the discrepancy exceed the standard error. The 
median value of the ratio (z—z’)/o,was 0.37 for eleven linear regressions, 
0.57 for five curvilinear cases, and 0.42 for all sixteen examples. The 
extreme value was 1.22. The ratio did not vary consistently with either 
r or N. Since sampling errors alone could cause several times as much 
difference as this in the computed values of r or p, and since the graphic 
errors were purposely maximized in this experiment, it is concluded 
that this method is reasonably accurate for many practical purposes. 





ON STRATIFICATION AND OPTIMUM ALLOCATIONS 


W. Duane Evans 
U. S. Bureau of Labor Statistics 


Useful condensations of the variance equations for estimates 
of the mean obiained from stratified proportional and opti- 
mum allocation samples are presented, and a criterion for 
deciding between semi-optimum and proportional allocation 
designs is derived. 


7 including N individuals is distributed according to a 
variate X with mean X and variance o?. The population is divided 
among r strata, and in the i-th stratum the WN; individuals are dis- 
tributed with mean X; and variance o,7. A sample is taken in each 
stratum. The sample taken in the i-th stratum includes n; members, 
and the aggregate sample size for all strata is n. An estimate of the 
population mean is formed by combining the separate stratum sample 
means, using as weights the proportions of the total population to be 
found in the several strata. Letting pi(=N;i/N) denote this proportion 
in the i-th stratum, and using primes to denote estimates, 


(1) X' = > pix,’ 


Following Neyman [1], or using the more concise method of Cornfield 
[2], the variance of such estimates is found to be 


_~< pi7a?(Ns — 5) 


(2) og? = 





n(N; — 1) 


Throughout the following, it will be assumed for purposes of simplifi- 
cation that the numbers of individuals included in the various strata 
are either large enough or sufficiently similar to permit the following 
approximation 


(3) Ni/(Nis — 1) = N/(N — 1). 


This relationship is, of course, exact for strata of equal size, and nearly 
exact for large strata. It will always yield for any stratum a figure 
within the range of the values of N;/(N;—1) for the various strata. It 
thus provides a very close approximation except in extreme cases. 

Combining (2) and (3), the general expression for the variance of 
means estimated from stratified samples becomes 


N pio," 1 
“Wonr nm (N—P) 
95 








Dd pioi?. 


(4) og? 
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If sample allocations are to be proportional to stratum size, 

(5) Ni = ND; 

and substitution in (4) yields 


(N —1n) 
(6) “* n(N —1r) —1r) X nee 
But 
‘ = a 
= N rar x (Xi; - x)? 
=n X; + X; — X)? 
(7) 


u = p> (Xi; — X;)? + Xu pi(Xi — X)? 


” zr pio;? + ox;?. 


Combining (7), the basic equation of the analysis of variance, with 
(6) gives the variance of estimate in terms of the total variance and the 
weighted variance among the stratum means. 


(N — n) 


nan) one 


(8) ox? = 
Turning now to the case of optimum allocations, Neyman has shown 
that the variance of the estimate will be least if the stratum sample 


sizes are determined by a relationship which, when modified (3), may 
be written: 


(9) ni = npioi / >) Di0;. 


j=l 


Combining ye and (9), 





ii Wa nN on Pia) oe 
(10) (N-n) a N | 
we: n(N —r) —r) p pio? — n(N — 1) [do pice? — (> pio;)*). 


The first term in this expression is equivalent to (8), and the brack- 








STRATIFICATION AND OPTIMUM ALLOCATIONS 97 


eted part of the second term is clearly the weighted variance among the 
stratum standard deviations. Representing this by an appropriate 
symbol, the equation becomes 

(N — n) N 


(11) og? = n(N — 1) [o? = ox;? al Wom 


Assembling results, and using single, double and triple primes to dis- 
tinguish among them, we have for simple random, proportional strati- 
fied, and optimum allocation stratified sampling: 


w= ,, 
n(N—-1).- 
(N — n) 


(12b) og? = nN —7) [o? — og,?]. 


© (N—n) — a a 2 
(12c) ort = n(N ms 4 [o? ox; (v ve es Oc; ]. 


(12a) ox? 


These equations are of some intrinsic interest. They show in quite 
concise form and in logical juxtaposition the factors on which a reduc- 
tion in variance through stratification and optimum allocations de- 
pends. The equations are convenient in form for estimating the aggre- 
gate size of sample required for a stated precision by the different de- 
signs, and they may be helpful in balancing a potential improvement in 
precision against possible extra expenses entailed by the more complex 
schemes. They show clearly, as remarked by Armitage [3], that under 
certain extreme conditions the simple random method may have a 
smaller variance than either of the stratified schemes. Parenthetically, 
the factor N/(N—n) in the bracketed section of (12c) indicates that 
optimum allocations tend to be of greater importance as the sampling 
ratio increases. 

A qualification of (12c) may be noted. The allocation procedure 
specified by (9), especially with large samples, may indicate a sample 
size exceeding the actual number of sampling units in a stratum. The 
procedure then followed is to make a complete enumeration in such 
strata, reallocating the balance of the sample on an optimum basis 
among the remaining strata. In such a case, all quantities in (12c) and 
its later derivatives in the text must be redefined to refer appropriately 
to the strata in which samples rather than complete enumerations are 
taken, and term by term comparison with (12a) and (12b) is no longer 
permissible. 
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The form of (12b) might suggest that stratification will always yield 
an advantage, since even randomly established strata will on the 
average show some variation among the stratum means. To investigate 
the possibility, assume that r strata are set up at random and that 
proportional allocations are used because information on the stratum 
standard deviations is not available. Using the exact form of (2) and 
incorporating (5), the average value of the variance of the estimated 
mean over repeated random sorts is given by 





> pi(N; — np,) 


(13) E(oz.*) = nN. = 1) 


E(o;?) . 


But the i-th stratum now represents a random sample of size N; from 
the population, and 


: N(N; -— 1) 
NAN — 1)° 


(14) E(o;7) =a 


Combining, it is seen that 
o7(N — n) 


(15) E(oz.*) = nN —1) 


which is the same as (12a). The factor (V—r) in the denominator of 
(12b) is then just large enough to compensate for the average value of 
the variance among stratum means on a random sort. 

The converse is cf some importance. Since on the average even ran- 
dom stratification will not reduce precision, stratification may be em- 
ployed without hesitation whenever there is even slight justification 
for supposing that the variable under study is related to the proposed 
mode of stratification. The only qualification is that continued division 
of the population into smaller strata may entail extra operating ex- 
penses that overbalance a possible additional reduction in the variance 
of estimates, but this can be judged only in the particular instance. 

Excluding an interest in stratum estimates, the principal object of 
stratification (and the only object where proportional allocations are to 
be made) will be to increase the variance among the stratum means. 
But this may become an ambiguous guide where several characteristics 
of the population are to be measured. If two characteristics under study 
are perfectly correlated, either positively or negatively, any mode of 
stratification will, of course, give an equivalent improvement in the 
estimates for each. If the two characteristics are not correlated at all, 
a given mode of stratification may improve estimates for neither, for 
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one but not the other, or for both. With intermediate degrees of correla- 
tion almost any result is possible. In practice, however, it will be found 
that an arrangement into strata which improves estimates for one 
characteristic will usually improve them for any other which is corre- 
lated positively or negatively with the first. But it must be remembered 
that this is not a necessary result. 

The idea may be formulated in more precise but perhaps no more 
useful terms as follows: Values for two variates, X and Y, are known 
for each member of a population of size N. A set of r positive numbers 
(Ni, N2, - +--+, N,) whose sum is N is given. From the population N, 
members are selected, then Nz from among the remainder, and so on 
until all are allocated. The process is repeated until all possible arrange- 
ments are recorded. Each arrangement may be regarded as a possible 
stratification of the population among r strata of the given size. The 
stratum averages of X and Y are noted for each arrangement. For 
some of these arrangements, the variance of an estimate of the popula- 
tion mean of X based on a stratified sample will be less than that for a 
simple random sample, and these modes of stratification may be called 
X-favorable. Some modes will be Y-favorable. If X and Y are corre- 
lated, either positively or negatively, the proportion of Y-favorable 
modes among the X-favorable will be higher than among all possible 
modes. As the correlation increases, the probability that an X-favorable 
mode will also be Y-favorable likewise increases, approaching certainty 
as the correlation approaches unity. Moreover, this is true in degree as 
well as kind, strongly ¥ favorable modes tending also to be strongly 
Y-favorable as the correlation increases. This is of limited practical 
importance because modes of stratification cannot be picked at random 
in this fashion, but it shows that there is some substance behind the 
intuitive feeling that an X-favorable mode of stratification should in 
some sense favor other correlated variables. 

Where proportional allocations are used with stratification to im- 
prove the estimates of a first characteristic, the worst result to be antici- 
pated for a second is that the variance may be increased above the 
variance for a random sample of the same size by the usually small 
factor (V—1)/N-—r), and this only in the very unusual case that the 
stratum means of the second characteristic are all identical. In general, 
then, stratification with proportional allocations to improve estimates 
of a first characteristic will sacrifice little, if anything, with respect to 
estimates of others. Where estimates of several characteristics are re- 
quired, the use of multiple modes of stratification, each related to one 
or more of the characteristics under study, may be especially helpful. 
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Somewhat greater care is necessary when it is desired to use optimum 
allocations. An optimum allocation for a first characteristic will tend to 
degrade estimates for a second unless the stratum standard deviations 
are positively correlated. In extreme cases, the estimate for the second 
characteristic may be substantially worse than would be obtained with 
a random sample of the same size. There is, however, an observable 
tendency for stratum standard deviations to be positively correlated 
when the characteristics are correlated either positively or negatively 
in the population. The reasoning applied to correlations among favor- 
able modes of stratification may be extended to this case as well. 

It remains to point out one very important difference between (12h) 
and (12c). The former, if it can be evaluated, will give the actual vari- 
ance resulting from the sampling operation. The latter represents a 
minimum or ideal figure which in fact is never attained. In practice, the 
values of the stratum standard deviations are never known exactly, and 
estimates of these quantities must be used in evaluating the allocation 
equation. To the extent that these estimates are inaccurate, the vari- 
ance of estimate will exceed that indicated by (12c). The variance may 
even be greater than that of proportional allocations, as a simple exam- 
ple will show. 

Suppose that in a particular situation the stratum standard devia- 
tions are in fact all equal (but this is not known). The proper optimum 
allocations will then be exactly proportional to stratum size. The use 
now of any estimating procedure to determine stratum standard devia- 
tions for allocation purposes, since in at least some cases it will yield 
non-proportional and non-optimum allocations, will increase the vari- 
ance beyond that of proportional allocations. 

This problem seems to have received little consideration in the litera- 
ture. Sukhatme [4] investigated the proposal to use a smaller sample 
from each stratum to estimate the standard deviations, and then use 
the results to determine allocations for the complete sample. He simpli- 
fied the variance equation by dropping the finite multiplier and as- 
sumed that the within stratum population distribution was approxi- 
mately normal. Fitting a Pearson curve to the general variance dis- 
tribution, he computed (for the series of cases he considered) the prob- 
ability that the proposed procedure would yield better results than 
might have been obtained with proportional allocations. He concluded 
that in most cases there would be an improvement. But the example 
given above (which Sukhatme mentions and dismisses) shows that 
this cannot be taken as a universal rule. 
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Trying a somewhat different approach, let S; represent an estimate 
of a number proportional to the standard deviation in the z-th stratum. 
Assume further that the estimating procedure employed will be such 
that on the average bias may be ignored; that is, where K is a constant, 


(16) E(S;) = Koj. 


The sample allocations actually made will then, continuing the ap- 
proximation of (3), be in accordance with the following 


(17) n= npS/ >, p;Sj- 
Substitution in (4) yields 
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The expected value of this expression over a sequence of estimates ob- 
viously depends on the expected value of S;/S;, all other terms being 
constant. Under any estimating procedure usually followed, S; and S; 
will be independent. Expanding the denominator of the fraction in a 
series about the reciprocal of o;, retaining the first three terms, and 
taking expected values, one obtains as an approximation for the desired 
value 





2 oj . 
ao) == 1+ c180] 
where C(S,) is the coefficient of variation of S;, defined as its standard 
deviation divided by its expected value. But in a given situation, fol- 
lowing a uniform procedure for estimating the stratum standard devia- 
tions, it is not at all unlikely that this quantity will be effectively con- 
stant from stratum to stratum. Indeed, the estimating procedure may 
well be set up on just this basis. For the present purpose, it will be 
assumed that 


(20) C(S,) = 
Combining (18), (19), and (20) yields, after simplifying, 


(19) —-E(S,/S,) = “(1 * 
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(21) E(ox,?) = 


N 
(1+C?)( > pios)?—C? > pte? -— Xp] : 


aval 


Following the methods previously used, and representing the comple- 
ment of p; by qi, this reduces to 


Bex.) = ——"* [ot - i ™ 2 
ai ia ee 


(22) 


N 
+ W-n CD) piqio? — oe, |. 


This expression, subject to the approximations involved in using (3), 
(16), (19), and (20), represents the actual variance which will be en- 
countered in practice when optimum allocations are attempted on the 
basis of estimated standard deviations. It is thus more directly com- 
parable with (12a) and (12b) than is (12c). 

The last term within the brackets clearly measures the loss in effi- 
ciency involved in using estimates in place of the true standard devia- 
tions. This term must always be positive (or zero) since the expression 
within the parentheses on reduction is seen to be equivalent to a square 
matrix of positive cross-products with the main diagonal omitted. 

Of more immediate practical interest, it is clear from comparison 
with (12b) that the use of estimated standard deviations for allocation 
purposes will yield results inferior to proportional allocations when the 
sum of the last two terms in (22) is positive. This implies that when 


(23) C? > o6;7/( Do piqios® — oe,?) 


the use of proportional allocations will be preferred. A somewhat more 
convenient computational form is given by 


Dd pio? — (D> pis)? 
24 C?> : ' 
” (> pios)? — > pte? 


A solution for the right side of the inequality will set a critical upper 
limit for the coefficient of variation of estimates of the stratum stand- 
ard deviations above which optimum allocations should not be at- 
tempted. It will be observed that the same information is required for 
an approximate evaluation as for making the attempted optimum 
allocations. A value for the left side of the inequality will depend on the 
type of estimating procedure used. A common sense evaluation may be 
based simply on experience in the subject matter field—-for example, 
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on general knowledge of the success with which rental distributions 
within an area may be used to indicate income variability. 

If some number, say D, can be found which is known to equal or 
exceed C(S;) for each stratum, its insertion in place of C in the right 
side of (22) will yield an expression which equals or exceeds the ex- 
pected value of the variance of estimate. Comparing with (12b), it is 
evident that if the square of D is less than the right side of (24), use of 
the estimated standard deviations in making allocations will certainly 
yield a lower variance than will proportional allocations. 

To illustrate, consider the following examples. 


Case A Case B 








Stratum 
%; ; oi 


20 . 30 
30 , 31 
40 , 32 





An evaluation of the inequality for case A indicates that optimum 
allocations will be preferred whenever the coefficient of variation in 
estimating the stratum standard deviations is less than 0.42. This is a 
very modest requirement. In this case, where there is substantial varia- 
tion among the stratum standard deviations, even crude information 
regarding stratum variability may result in substantial improvement 


of estimates of the mean. 

For case B, in contrast, proportional allocations would be preferred 
unless the coefficient of variation of the stratum standard deviation 
estimates could be brought below 0.037. Here the precision require- 
ments are rather high. It is fortunate, but of course not coincidental, 
that the data requirements are least strict where attempted optimum 
allocations are likely to aid most in iraproving estimate of the mean. 

To return to Sukhatme’s conjecture, the point at issue is certainly 
not whether presampling will yield information about stratum stand- 
ard deviations; any sample numbering two or more will do this. The 
essential consideration is whether a sample of a size which is adminis- 
tratively feasible will yield sufficient information to improve estimates 
of the mean obtained by approximate optimum allocations above those 
obtained by proportional samples. 

It may be shown (though the development is too long for inclusion 
here) that the squared coefficient of variation for estimates of the stand- 
ard deviation obtained by means of a sample of size m from a substan- 
tially larger population is approximately 


(25) C? = (62 — 1)/4(m — 1) 
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where 2 is the familiar Pearson kurtosis criterion for the population 
from which the sample is drawn. By rearrangement, 


(26) m = 1-+ (B2 — 1)/4C?. 
Combining with (23) 
(27) m = 1+ (62 — 1)(d0 piqios? — 06,7) /40,7. 


In this expression, m represents (approximately) the minimum size of 
presample within a stratum which on the average will yield sufficient 
information about the standard deviation to make an attempt at opti- 
mum allocations at least as attractive as proportional allocations. 

For purposes of illustration, we may return to the examples given 
above. Suppose that in each of the strata for both cases A and B the 
value of 2 is about 5. (Values for distributions of families according to 
their incomes may range from 10 to as much as 25.) Equation (27) then 
indicates that for case A any sample exceeding 7 per stratum (or 21 in 
total) would permit allocations which would tend to improve the re- 
sults. A presample of as few as 50 per stratum would almost certainly 
yield allocations reducing the variance of estimates of the mean well 
below the variance of a proportional scheme. 

In contrast, a presample exceeding 742 individuals per stratum would 
be required in case B to make proportional allocations the less desirable 
alternative. It is unlikely that such a large presample (perhaps exceed- 
ing the desired aggregate size of the final sample) would in many cases 
be considered feasible. 

Presampling is not always administratively possible. When it is, 
values obtained from the presample may be used to evaluate approxi- 
mately the right side of (27). A choice between proportional and opti- 
mum allocations may then be made depending on whether the indi- 
cated value for m exceeds or falls below an administratively determined 
upper limit for the size of the presample. Once this choice has been 
made, (12b) or (22), as appropriate, may be used to determine approxi- 
mately the aggregate size of sample required for a given level of pre- 
cision. 

REFERENCES 
[1] Neyman, J., “On the Two Different Aspects of the Representative Method,” 
Journal of the Royal Statistical Society, 97 (1934) 558-625. 
[2] Cornfield, Jerome, “On Samples from Finite Populations,” Journal of the 
American Statistical Association, 39 (1944) 236-9. 
[3] Armitage, P., “A Comparison of Stratified with Unrestricted Random 
Sampling from a Finite Population,” Biometrika, 34 (1947) 273-80. 


[4] Sukhatme, P. V., “Contribution to the Theory of the Representative Meth- 
od,” Journal of the Royal Statistical Society, Supplement, 2 (1935) 253-68. 











1951 


ion 


1) 


le 
z 


S 


; 
7) 








SAMPLING WITH PROBABILITIES PROPORTIONAL TO 
SIZE: ADJUSTMENT FOR CHANGES IN 
THE PROBABILITIES 


NaTHAN KEYFITZ 
Dominion Bureau of Statistics 


E CONSIDER a simplified area sample consisting of one primary 
W sampling unit chosen from each of a number of strata; each such 
selected unit is to be enumerated completely. The sample “take” (i.e., 
the total in the sample of whatever is being surveyed) can be converted 
to an estimate of the population in each stratum simply by multiplying 
by the number of units in the stratum; and this is, for practical pur- 
poses, the only method of estimating if no information about the un- 
sampled units other than their number is on hand. If, however, some 
evidence is available on the relative “sizes” of the units in a stratum one 
would like to incorporate this in the estimating procedure. In an 
application which has now become fairly common, a number of differ- 
ent characteristics are to be estimated by a uniform procedure from a 
single survey, and the measure of size related to all characteristics is 
the number of persons in the several units at a preceding census. 

The Canadian Labor Force sample, like that used for the Monthly 
Report on the Labor Force of the United States, consists of up to four 
successive stages;! the theory here considered is applicable to the first 
stage. In the first stage the entire rural territory of Canada was divided 
into about 500 areas constituting primary sampling units, and these 
were assembled into strata each containing between 5 and 10 units. 
There was no requirement that the units comprising a stratum were to 
be contiguous; but only that they be as similar as possible in respect 
of type of farming, density of population etc.; nor was any attempt 
made to assemble into one stratum those units similar in the numbers 
of people they contained at the preceding census. The sub-sampling 
within selected primary sampling units was arranged so that multiplica- 
tion of the sample take by 100 furnished an unbiased estimate of the 
stratum total; this was accomplished, following the suggestion of 
Hansen and Hurwitz, by sub-sampling in a ratio which depended on 
which unit was selected, so as to give a fixed expected number in the 
sample, i.e., 0.01 of the measure of size of the stratum. The application 
of the theorem of this paper, however, is not affected by sub-sampling; 
we may think of the estimating procedure as a matter of multiplying 


1 U.N. Sub-Commission on Statistical Sampling—Recommendations concerning the preparation of 
reports of sampling surveys—E/CN.3/52/Add. 1, p. 4. 
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the sample take of the selected unit by the ratio of total size (1941 
population) of the stratum to that of the selected unit. 

In the model we are considering, where the selected unit is to be 
enumerated completely, there are M units in a stratum and the 7-th unit 
has been selected; the simplest estimate is MX; where X; is the sample 
take in the 7-th unit. The measure of size, P, permits the alternative 
estimate ()>>P/P,)X;, whose expected value is doxi( >P/P)X;, 
where 7; is the probability which has been assigned the 7-th unit in the 
initial selection, and whose variance is 


Dal(d P/P)X: — Xi}. 


A set of P; as nearly as possible proportional to the X; will minimize the 
variance of the estimate. 

In order that the estimate using the measure of size may be unbiased, 
it is necessary that the random choice of a unit from the stratum be 
made with probability proportional to the measure of size which is to 
be used in the estimating. In this case ;=P;/ }-P and the expect- 
ed value of the estimate is 


D~IP:/ > P.>°P/P;-Xi]= 7, Be. 


All of this is stated or clearly implied by Hansen and Hurwitz in 
their fundamental paper, “On the Theory of Sampling from a Finite 


Population,” Annals of Mathematical Statistics, December, 1943. The 
purpose of the present paper is to describe a device for changing to a 
new set of probabilities. 

New information will be available in 1951 on the relative sizes of the 
primary sampling units in each stratum and it is intended to make use 
of this information. An unbiased procedure would be to select afresh 
one unit within each stratum, with probability proportional to the 
newly obtained measure of size and then to use this measure of size in 
estimating. It is the case, however, that a substantial investment has 
been made in the form of lists of households in the selected units; an 
administrative requirement is, therefore, that as few as possible of the 
originally selected units be changed. This paper presents a device for 
adjusting probabilities so that the selected unit is chosen with proba- 
bility proportional to the 1951 measure of size with the new unit the 
same as the old one in as many of the strata as possible. One could 
incorporate the new information by making ratio or regression esti- 
mates but these would complicate the office procedures, which is es- 
pecially undesirable where speed is an objective. 
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We suppose that in a stratum consisting of units A, B, C, and D, a 
sample of one unit has been originally drawn with probabilities a, 8, y, 
and 6, respectively, assigned to the several units. In accordance with 
more recent data at hand, the probabilities of the several units are to 
be changed to a, b, c, and d, respectively (a+8+7y+6=a+b+c+d=1). 
The device to be presented is applicable to any number of sampling 
units. Let us suppose that a>a, b>8, c<+y, and d<6, although in re- 
spect of these inequalities also the result is general. 

The first step is to note whether the originally selected unit is one 
whose probability is to be increased or to be decreased, i.e., whether 
it is A or B on the one hand, or C or D on the other. If it is A or B, 
nothing further is done. If it is C or D, some chance of changing must 
be introduced. The required probabilities of changing are (y—c)/y and 
(6—d)/é respectively. If such a required probability is 0.07, for example, 
we in effect ask a table of random numbers (e.g., Fisher and Yates’) 
whether a change should be made by noting in respect of some two 
digit number whether it falls between 01 and 07 inclusive, and if it 
does we interpret the answer as “yes.” 

If a change is to be made, the unit C or D will be dropped, and either 
A or B will take its place; we must next ascertain which of A or B is to 
be chosen. To do so we draw with probability (a—a)/(a—a+b—8). If 
the table replies “yes” the choice falls on A; if “no” it falls on B. 

It is easily verified that with this procedure each unit has the re- 
quired probability of selection. The probability of A for example is 
equal to a, the original chance of selection, plus some increment in re- 
spect of the probability that either C or D was originally chosen. These 
probabilities are y and 6, and multiplying them respectively by the 
probability that we decided to make a change (y—c)/y and (6—d)/é and 
again by (a—a)/(a—a+b— 8) which is the probability that if the change 
was to be made the new unit would be A rather B, we find (on applying 
the equality a—a+b—8=y—c+5—d) that the new over-all probability 
attained is a. 

It follows that the probability of changing is the sum of the differ- 
ences between the old probabilities and the new for all of those units 
where the probabilities are to be increased, i.e., a—a+b—B=y—c 
+5-—d. Is there any method of attaining the new probabilities which 
is less likely to make a change? 

Suppose that C was originally chosen and that it is possible to use 
some probability of changing, p, which is smaller than (y—c)/y. The 
combined probability of first choosing C and then replacing it is there- 
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fore yp and its over-all probability of ending up as the selected unit 
(ie., after the drawing to find whether a change is to be made) is 
y(1—p). To meet the requirements of the problem this must equal c; 
but it is in fact greater than c, as can be seen by applying the inequality 
p<(y—c)/6; C has therefore been given too great a chance of final in- 
clusion. A similar proof applies in respect of the possibility that D has 
been originally chosen. If (y—c)/y and (6—d)/é are the smallest prob- 
abilities that can be given to changing from C and from D, then the 
over-all probability of making a change cannot be less than 
vl(y—¢)/y]+4[(6—d)/5] ie. than y—c+é6—d. 

This method may be contrasted with making a fresh start, which 
would retain the original units in only aa+8b+cy-+dé of cases. In the 
extreme case where a=8=y=6=a=b=c=d=}, a fresh start would 
change units 3/4 of the time while the above procedure would never 
require a change. 

One may exemplify the procedure with a 4-unit stratum in the 
Prairie Provinces where the Census of 1941 on which the sample was 
originally drawn became obsolete with the 1946 quinquennial census. 
Table 1 shows the old and new distribution of probabilities and the 
procedure for drawing. In this case the probability of making a change 
on the device described is 0.021 and with a fresh start is 0.709. 


TABLE 1 


ADJUSTMENT OF PROBABILITIES FROM 1941 TO 1946 CENSUS DATA FOR 
A FOUR-UNIT STRATUM 


Probability Using Probability Using 
Unit 1941 Census Measure 1946 Census Measure 
of Size of Size 
A a =0.07281 a =0.08202 
B 8 =0.32310 b =0.33509 
C + =0.29267 c=0.27980 
D (originally chosen) 6=0.31142 d=0.30309 


Probability of change since D was originally chosen: (8 —d)/ =0.027 
Probability of changing to A if a change is made: (a —a)/(a —a +b —8) =0.434 
Probability of changing to B if a change is made: (b —8)/(a —a+b —8) =0.566 


The device serves also to draw with probability proportional to size 
in a latin square. The latin square may be made up with approximately 
equal probabilities in the different cells by grouping pairs of units into 
one cell where the units are small, and leaving for the end the possible 
selection between the two units with probability proportional to their 
sizes. By this method a table showing 16 cells was arranged for the 
Province of New Brunswick in the original drawing of the Labor Force 
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Survey. A selection of four units out of the sixteen was made with equal 
probabilities with the help of Fisher and Yates tables,? and for the 
several units in each of the four strata the probabilities were adjusted 
by the device here described. In proportion as the initial attempt to 
make the probabilities all equal to 1/4 has been successful, the prob- 
ability of disrupting the latin square in the adjustment of probabilities 
is small. Even where the chance of disruption is as high as 0.2, as was 
the case for the New Brunswick selection, there seems an advantage in 
applying the method since 0.8 of the time the two-way stratification 
will be retained. 





?R. A. Fisher and F. Yates, Statistical Tables for Bioloyical, Agricultural and Medical Research. 
Edinburgh, Oliver and Boyd. 





A SOURCE OF BIAS IN ONE OF THE SAMPLES 
OF THE 1950 CENSUS 


Peter O. STEINER 
University of California (Berkeley) 


HILE in general the procedures employed in the 1950 Census of 

Population seem designed to minimize avoidable bias in the re- 
sults, both through the design of the questions' and through the in- 
structions to the enumerators,? one of the smaller samples seems to me 
to be subject to a type of systematic bias that may render interpreta- 
tion of the results hazardous. 

I should like to make it perfectly clear at the outset that the decision 
by the Bureau of the Census to employ the sampling procedure here- 
after discussed was doubtless a judgment made in full awareness of the 
source of bias thus introduced, after careful consideration of the rela- 
tive importance of the data, the costs of alternative procedures and 
other relevant factors. My purposes are: (1) to call attention of users 
of data thus collected to this source of bias, (2) to invite examination 
of the quantitative importance of this source of bias, and (3) to invite 
discussion of the ways in which the effects of this bias may be circum- 
vented. 

While the census aims at complete enumeration of the population, it 
seems certain that owing to unavoidable omissions the enumerated 
population differs from the true population. This note ignores bias that 
may be introduced because of this, and deals with bias only insofar as 
the samples are not representative of the enumerated population. The 
enumerated population is used to obtain two samples, one which se- 
lects every fifth person and one which selects every thirtieth person for 
special enumeration. For convenience these will be designated as the 
20% sample and the 3% sample respectively. While these samples 
overlap (i.e., the 3% sample is part of the 20% sample) it is with the 
smaller sample that this note is concerned. 

This 3% sample is designed to secure two distinct classes of informa- 
tion: (1) additional data on marital history, and (2) the last occupation 
of people who have left the work force in the last year—that is with 
people employed (part or full time) in 1949 who are neither currently 
employed nor looking for work. It appears to me that for each of these 





1 See Form P 1, U. 8. Department of Commerce Bureau of the Census, 1950 Census of Population 
and Housing. 

2 See Urban Enumerator’s Reference Manual, 1950 Census of the United States, U. 8. Dept. of 
Commerce, Bureau of the Census, Washington, D. C. 
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purposes the group subject to questioning on these matters is a biased 
sample of the enumerated population and that, at least in the second 
case, this may critically affect the results. The contention rests upon 
the method of selection of the 3% sample. The basic Form P / provides 
30 lines for enumeration of population, and on the reverse side 12 lines 
for enumeration of dwelling units. The 3% sample is drawn from a pre- 
designated line of the last five on the population side—i.e., from lines 
26 to 30. This sample is used to obtain the data on marriage history 
and last employment of people who have now left the work force. 
Marriage data are acquired for a fairly large proportion of those in the 
3% sample: all people over 14 who have ever been married. The work 
force data cover a much smaller group: those over 14 who worked in 
one or more weeks in 1949 and who were neither employed nor looking 
for work in the week prior to the enumeration. 

There would be no cause for special concern with the 3% sample if it 
were representative of the entire population enumerated—it would 
then contain essentially the same biases as the enumerated population 
as a Whole, and while these might be more serious in the sample they 
would clearly be of the same kind. The 3% sample, however, is not 
representative of the entire enumerated population because it does not 
in fact take every thirtieth person. This occurs because the sample line 
is not always filled. This may occur in several ways, not all of which 
introduce bias: 

1. The line is left blank at the end of the enumeration of the district.* 
This is obviously trivial—it reduces the size of the sample, but does not 
alter its composition. 

2. It represents a vacant dwelling unit. This again is not significant 
as there is no reason to expect this to occur more frequently on the 
designated line than on any other. 

3. The sample line is unfilled because it bears the notation “no one 
at home.” There is again no reason to expect this to occur with greater 
frequency on the sample line than elsewhere. In such cases, however, a 
“call-back” is required and the household is eventually enumerated on 
an “out of order” sheet. Since “no one at home” is more likely to occur 
where the number of members of the household is small (or where all 
members are employed) the enumeration on out of order sheets is 
likely to consist of households having a relatively low number of occu- 
pants per dwelling unit For this reason these sheets have an increased 
likelihood of falling into the class suggested in 4. below. 





* This may occur twice in each district, once on the regular sheets and once on the “out of order” 
sheets. 





112 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1951 


4. The last lines are left vacant because of the prior completion of 
the housing side of the P 1 form. This occurs whenever 12 housing units 
are enumerated having less than the appropriate total number of in- 
habitants (26-30 depending upon which is the designated sample line). 
This in turn can come about in either of two ways, or in combination 
of them: 

(a) when the average number of persons per occupied dwelling unit 
(hereafter called the occupancy ratio) is less than 2.17-—2.50 (depending 
again upon which is the sample line). While an occupancy ratio of less 
than 2.5 may seem small, it is not infrequently encountered. Indeed, in 
preliminary data for an urban area of over 300,000, the only area for 
which I have any information, the over-all occupancy ratio was below 
3.0, and it is apparent that in some residential districts, for example 
those characterized by apartment buildings composed of small sized 
dwelling units or those occupied by a predominantly aged population, 
the occupancy ratio will be substantially below the over-all average. 

(b) when there are vacant dwelling units. Such vacant units (in- 
cluding those occupied by non-residents) take one line on each side of 
the P 1 schedule, and obviously increase the occupancy ratio needed to 
assure filling of the sample line. 

In combination, 4(a) and 4(b) are of more than trivial concern. The 
enumeration districts correspond to geographical areas which are de- 
signed, in urban areas at least, to include up to 1,000 persons. Enumera- 
tion within a district procecds in a systematic block-by-block progres- 
sion with the result that the typical P 1 sheet covers, at least in urban 
areas, a very small geographic area—indeed it does not take a very 
large apartment house to cover one or more sheets. 

It is a well known and easily demonstrable fact that small geographic 
units in residential areas—blocks or streets—are much more homogene- 
ous with respect to the income, racial origin, religion, and occupation of 
the inhabitants than is the population as a whole, or the population of 
an entire urban community. There is directly associated with this a 
very marked difference in the occupancy ratio among such districts, 
and there is further (though at present this may be a minor matter) a 
marked difference in incidence of vacancies. For these reasons one 
would expect that the number of cases where the sample line is left 
unfilled by prior completion of the housing side would show substantial 
variation from district to district, and particularly among districts 
occupied largely by people with different types of occupational experi- 
ence. 

Since there is substantial reason to expect marked differences among 
various sub-groups of the population with respect both to marital his- 
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tory and to occupation of those having left the work force in the last 
year> a sampling situation which distorts representation of various 
groups renders the resulting sample exceedingly difficult to interpret. 
If such a sample is sufficiently large it is possible to subdivide it into 
more homogeneous groups and avoid much of the error otherwise in- 
troduced. This may be the case with the marriage data. But where the 
sample is very small, as will be the case with withdrawals from the 
work force, this may introduce more hazards than it eliminates. 

Obviously the quantitative importance of this objection can be tested 
only by those with access to the census enumeration sheets. A small 
number of inquiries of local census authorities suggests that the varia- 
tion among districts in percentage of prior completions of the housing 
side of the P 1 form is substantial. If this proves to be generally true 
the source of bias mentioned will be significant.® 





4 The questions: “Has this person been married more than once?”; “How many years since this per 
son was (last) married, widowed, divorced, or separated?” and, if a woman, “How many children has 
she ever borne, not counting stillbirths?” 

5 The questions for such people: “What kind of work did this person do in his last job?”; “What 
kind of business or industry did he work in?” “Class of worker (for private employer, for government, 
in own business, without pay on family farm or business).” 

* Mr. Morris H. Hansen, Assistant Director for Statistical Standards, Bureau of the Census, has 
been kind enough to read this paper and to provide the following helpful information and comments: 

“The frequency with which the problem occurs is small. Preliminary examination of the returns 
indicates [that] between 3 and 4 per cent of the sheets were not initially represented in the sample 
because of this problem, although the percentage varies from area to area.” 

With respect to this source of bias Mr. Hansen comments as follows: 

“In examining the returns, the schedules on which no one could be included in the 33 per cent sam- 
ple [which I have called the 3 per cent sample] because the housing side of the schedules were filled are 
being identified, and a 34 per cent sample of persons from such schedules are being added to the sample 
initially returned by the enumerstors. This eliminates the bias, insofar as the people designated for 
inclusion in the sample are concerned. The supplemental information is being requested for those added 
to the sample, and particular attention will be given to the areas where the proportion of uncompleted 
sheets is high enough that the effect of incomplete responses can be appreciable. This procedure was 
regarded as more convenient than the alternatives considered that avoided the bias in the initial field 
collection.” 

Mr. Hansen also writes: “This sort of bias in initial sample designation should be sharply distin- 
guished from types of bias that cannot be explicitly identified in the returns. In addition to this type of 
potential bias, which is readily detectable and can be eliminated, there is the possibility that other 
nonrandom proeesses may affect both the 20 per cent and the 3} per cent sample. For instance, although 
the order of enumeration of households and of persons within households was carefully prescribed in the 
enumerators’ instructions, it is unlikely that these instructions were followed perfectly; thus, it is 
probable that enumerators exercised some degree of contrul, conscious or unconscious, in determining 
the persons to be enumerated on sample lines. Consequently, it is possible that the income or other sup- 
plementary questions for persons on the sample lines may have influenced the enumerators to deviate 
from the prescribed procedure in some instances. 

“These and other possible biases whose occurrence cannot fully be detected by an examination of 
the schedule before processing will be reflected in some degree in the tabulations. The 1940 experience 
and pretest results gave evidence that this problem might not be serious. However, the Bureau is under- 
taking an intensive study, including not only analyses of the tabulations, but also a special recanvass of 
some 4000 small areas throughout the United States, in an effort to obtain quantitative measures of the 
effect of such biases, as well as to measure completeness of coverage, and response errors and varia- 
tions in both census and sample results. The major findings of this study will be reported in the regular 
census volumes and special reports.” 





THE DISTRIBUTION OF BLOCKS IN AN UNCONGESTED 
STREAM OF AUTOMOBILE TRAFFIC 


Morton S. Rarr 
U. S. Bureau of Public Roads* 


The time instants at which the cars in a stream of uncon- 
gested traffic pass a point on the roadway are distributed at 
random. To a pedestrian trying to cross the street or to a 
driver on a minor side-street, the traffic stream may be viewed 
as an alternating succession of periods which permit crossing 
and periods which do not. Probability distributions of the 
lengths of these periods are developed and discussed. 


INTRODUCTION 


HE flow of uncongested traffic past a particular location along a 
ype has been found [references 1, 2] to be distributed at ran- 
dom in the following sense: for any given period of time the number of 
passages—i.e., the number of cars passing the location—has a Poisson 
distribution, and for any given number of passages the instants of 
passage are uniformly and independently distributed. If N is the aver- 
age number of cars passing the location in unit time—traffic engineers 


call this quantity the traffic volume—the probability that a time inter- 
val of length ¢ will contain exactly k cars is 


(Nt)* 
kl 


e7~Nt 


STATEMENT OF THE PROBLEM 


The present problem arises from the consideration of traffic be- 
havior at intersections controlled by Stop signs. In the idealized de- 
scription of such an intersection it is assumed that the cars on the main 
street pass the intersection in accordance with (1), without interference 
from the side-street traffic. The arrival time of each side-street car is 
assumed to be arbitrary; however, the time at which a side-street car 
actually enters the intersection depends upon the time interval be- 
tween its arrival and the passage of the next main-street car. If this 
interval is longer than a fixed quantity L, called the critical lag, the 
side-street car proceeds without waiting; otherwise it waits until a car- 
free interval longer than L becomes available. 





* This research was done at the Bureau of Highway Traffic, Yale University, on a grant from the 
Eno Foundation for Highway Traffic Control. 
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From the point of view of a side-street car, then, the main-street 
traffic can be thought of as an alternating succession of periods during 
which crossing is impossible and periods during which crossing is 
possible. These periods will be called blocks and aniiblocks respectively. 

Two principal questions will be answered in this paper: (1) What is 
the probability that an instant selected at random is contained in a 
block? (2) What is the probability distribution of block sizes? In 
other words, what is the probability that a block, selected at random 
from the whole set of blocks, will have its size in a preassigned range? 


SEVERAL Ways OF LOOKING AT A TRAFFIC STREAM 


The most direct way to think of a stream of traffic is in terms of the 
times of passage of the various cars, as shown in Figure 1. The prob- 


INSTANT OF PASSAGE OF 


CARS ON MAIN STREET 
7 os, 


/-\\~\ 
ee 





Figure 1. Gaps, blocks, and antiblocks in a stream of traffic. 


ability of finding k of these instants in an interval of length ¢ has al- 
ready been stated in formula (1). 

The next higher level of abstraction is to think not of the instants 
at which the cars pass the intersection but of the time intervals separat- 
ing each car from its successor. These intervals, which will be called 
gaps, are also illustrated in Figure 1. The probability distribution of 
gap sizes [1, 2, 3] is as follows: the probability that a gap selected at 
random is equal to or shorter than ¢ is 


i-o™, (2) 


Finally, the traffic stream may be viewed in the way it appears to a 
pragmatic side-street driver, as a succession of blocks and antiblocks 
(see Fig. 1). All the time which precedes the passage of any car by L 
or less is contained in blocks, while the rest of the time is contained in 
antiblocks. 
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PROBABILITY THAT AN INSTANT Is CONTAINED IN A BLOCK 


It seems desirable at this point to introduce the notion of a long 
finite period of time having the property that that frequency distribu- 
tions of gaps, antiblocks, and blocks in the finite period are equal to 
the expected frequencies based on the probability distributions. The 
total number of car passages in this period will be denoted by the letter 
A. Since N is the average number of cars per unit time, the total length 
of the period is A/N. 

In terms of the finite period the first of the two questions becomes, 
what fraction of the total time is contained in blocks? Since the total 
time is known, this is equivalent to asking how much time is contained 
in blocks. Finally, since all the time not belonging to blocks is contained 
in antiblocks, an equivalent question is, what is the total time con- 
tained in antiblocks? 

It is apparent that each antiblock is a part of some gap which is 
longer than L. Conversely, in every gap of length L+t, where t>0, 
the first ¢ of the gap is an antiblock. Thus there is a one-to-one corre- 
spondence between antiblocks, on the one hand, and gaps longer than 
L, on the other. The number of antiblocks longer than ¢ is the same as 
the number of gaps longer than L++t, which is, from formula (2), 


Ae-N(i+t), (3) 
The total number of antiblocks, clearly, is 
Ae-Nt; (4) 


hence the number of antiblocks per unit time is Ne~*” and the number 
of antiblocks equal to or shorter than ¢ is 


Ae-N¥(1 — e-**), (5) 


The total time in antiblocks is obtained by multiplying ¢ by the 
differential of expression (5) and integrating from 0 to ~. The result is 


A 
—e-NL, 


N 


from which the probability that an instant is contained in an antiblock 
is seen to be e~¥¥, 

Thus the probability that an instant selected at random is contained 
in a block is 


1-5, (7) 





1951 


png 
Due 
to 


ter 
rth 


tal 








BLOCKS IN STREAM OF TRAFFIC 117 


RELATIONSHIP BETWEEN DISTRIBUTION OF BLOCK 
S1zEs AND DISTRIBUTION OF WAITING TIMES 


The derivation of the probability distribution of block sizes is con- 
siderably more difficult than in the case of antiblocks, because of the 
absence of any simple relationship between a particular block and any 
gap or group of gaps (see Fig. 1). For this reason the answer will be 
approached indirectly, through the distribution of waiting times for the 
side-street cars. The two distributions are intimately related, as the 
next three paragraphs will make clear. 

Let B(t) be the probability that a block is equal to or shorter than t. 
Let F(t) be the probability that the waiting time of a side-street car is 
equal to or shorter than ¢. Clearly F(0) =e-¥", since a car arriving dur- 
ing an antiblock has zero waiting time. 

The probability that the waiting time is in the range u< 7 Su+du 
is the probability that an instant is in a block and precedes the end of 
the block by an amount of time which is between u and u+du. This is 
equal to du times the average number of blocks per unit time which are 
longer than u, i.e., Ne~¥“[1—B(u) ]du. Thus 


F(t) = F(0) + Ne-¥# f [1 — B(u) jdu 
0 


= eNl(] + Nt) — New f Bewdu. (8) 


To express B(t) in terms of F(t), one need only differentiate both 
sides of (8) with respect to ¢ and solve the resulting equation for B(). 
The formula is: 


Bit) =1- F’(t). (9) 





DISTRIBUTION OF WAITING TIMES 
The cumulative distribution of waiting times follows directly from a 
formula developed by Garwood [4]. The result is 
[N(t—iL)]* [N(t — iL)]* 
\ n° G+! } 
for(h-1)L StS hL. (10) 





Pe) =D (— mH! 


t=0 


It will be noted that this expression is a finite series in which the num- 
ber of terms increases with increasing values of ¢. 
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While F(t) appears to depend on the three variables N, L, and , it is 
really a function of only the two variables NL and t/L. These variables 
represent, respectively, the traffic volume and the waiting time, where 
the time unit is the critical lag. 

Computed values of F(t) are given in Table I and illustrated graphi- 
cally in Figure 2. Values of NL range from 0 to 6, covering all the 
values which are likely to arise in actual traffic situations. For each 


TABLE I 
VALUES OF P(t) 











t/L om 0 .% .50 .75 1 1.25 15 1.75 2 25 3 35 4 45 5 65.5 6 
0 1 .779 .606 .472 .368 .286 .223 .174 .135 .082 .050 .030 .018 .O11 .007 .004 .002 
0.5 1 .876 .758 .650 .552 .466 .390 .326 .271 .185 .124 .003 .055 .036 .024 .015 .010 
1 1 .974 .910 .827 .736 .645 .558 .478 .406 .287 .199 .136 .092 .061 .040 .027 .017 
1.5 1 .990 .958 .904 .835 .756 .674 .592 .514 .376 .267 
2 1 .9083 .983 .951 .901 .836 .762 .683 .603 .455 .330 .233 .161 .109 .073 .049 .032 
2.5 1 -993 -940 .889 .826 .753 .676 .523 .387 
3 1 -997 .986 .963 .925 .873 .808 .736 .583 .440 .319 .225 .155 .105 .070 .047 
4 1 -996 .986 .966 .932 .884 .824 .681 .531 .395 .284 .198 .136 .092 .061 
5 1 -995 .984 .964 .930 .883 .756 .608 .463 .338 .240 .165 .112 .075 
6 1 - 993 
7 1 -990 -048 .858 .726 .577 .436 .316 .222 .152 .102 
9 1 -991 
ll 1 -990 .951 .866 .737 .589 .446 .323 .227 .155 
17 1 -990 .954 .871 .745 .596 .451 .326 .228 
27 1 -992 .961 .884 .761 .613 .465 .336 
42 1 -904 .965 .892 .770 .621 .471 
64 1 -994 .966 .893 .771 .620 
94 1 -993 .963 .885 .759 

136 1 -992 .956 .872 
197 1 -989 .949 
207 1 - 989 








value of NL, t has been carried high enough to bring F(t) up to 0.99. 
(In one case this requires ¢ to be nearly 300L. Fortunately, however, all 
but the first 18 terms of the 300-term series could be neglected without 
affecting the computed result.) 

A number of interesting properties of F(t) will be noted from Table I 
and Figure 2. First, it will be seen that F(0) is not zero but e~”“, which 
is another way of saying that the cars which arriv> in antiblocks are 
not required to wait at sii. Secondly, F(¢) rises at a tinear rate as ¢ goes 
from 0 to L, and it has an abrupt change of slope at the latter value. 
Thirdly, as might be expected, F(t) decreases steadily with increasing 
traffic volumes. 
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One fact which is not apparent from the table and the graph is that 
for large values of the parameters F(t) approaches the simple exponen- 
tial function 


1 — (1 — e~*¥#) exp | - (11) 





Ne~*4(1 — e-**) ] 
1 — eNL — NLe** 
This approximating function is quite good even for small values of the 


parameters. If it had been used for calculating all the numbers in Table 
I, the greatest error would have been 0.026. 
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t/Ls LENGTH OF WAIT, IN CRITICAL LAGS 


Figure 2. Cumulative distribution of waiting times. This graph also gives 
the cumulative distribution of block lengths, provided the abscissas are increased 
by one. 


Finally, we shall note a remarkable property of F(t) which comes 
directly out of expression (10). The density function is 


F’(t) = Ne-*¥4[1 — F(t — L)]; (12) 
i.e., F’(t) is a linear function of F(t—Z). 
DISTRIBUTION OF BLOcK SIZES 


The cumulative distribution function of block sizes can now be ob- 
tained, by combining equations (9) and (12). The probability that a 
block is equal to or shorter than ¢ is 


B(t) = F(t — L), (13) 
where F(t) is given by equation (10). Some of the interesting properties 
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of the block distribution are the following: 

(a) There are no blocks shorter than L, i.e., B(t)=0 when t<L. 
This is geometrically apparent, since every block must contain 
at least an interval L preceding the car passage which ends the 
block.! 

(b) There are a certain number of blocks whose length is exactly 
equal to L, i.e., B(t) jumps from 0 to e~¥” at t=L. This too is 
apparent from geometrical considerations, since a block of 
length L occurs whenever there are two consecutive gaps longer 
than L (see Fig. 1). 

(c) Since the total time in blocks, from formula (7), is 
(A/N)(1—e-*) and the total number of blocks, from formula 
(4), is Ae~4, the average (expected) value of block length is 


1—- ew 


Ne-* L ° 


(14) 


The average watting time and also its median value are plotted in 
Figure 3. The average and median block lengths can also be read from 
this graph, since it follows from (13) that the average and median block 
lengths exceed the corresponding values of the waiting time by an 
amount L. It will be noted that the median is always less than the 
average, and that both of them take on very large values when the 
traffic volume is at all sizeable. 


APPLICATION 


Studies made on the Merritt Parkway, a high-type facility which is 
capable of handling large volumes of traffic without appreciable con- 
gestion, indicate a close correspondence between the theoretical and 
the observed distributions of block lengths. At a fixed location on the 
parkway, the instants of passage of all the cars were recorded on a 
machine using pens on a moving chart. The gaps were computed from 
this record and listed in chronological order. Then, using an assumed 
value for the critical lag, the blocks were computed, listed in chrono- 
logical order, and listed in order of size; this list was then compared 
with a set of numbers computed from the theoretical formula (13). 

The observation period contained 1536 cars, passing by at an average 
rate of 969 cars per hour. Several tabulations were made, each for a 
different assumed value of L. These values ranged from 2 to 12 seconds. 
Table IT lists the results of the tabulation for a critical lag of 6 seconds, 





1 The functions F(t) and B(t) had not yet been derived at the time reference [3] was written. 
Nevertheless the properties (a) and (b) are given there, deduced from geometrical considerations. 
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which is a typical value. It will be noted that the number of blocks is 
too low, by about 2 per cent, but otherwise the agreement between 
observation and theory is extremely close. 
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NL = TRAFFIC VOLUME, IN CARS PER CRITICAL LAG 


Ficure 3. Average and median waiting times. This graph also gives the average 
and median block lengths, provided the ordinates are increased by one. 
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TABLE II. Block Lengths, Theory vs. Observation 
(A =1536, N =969/3600, L =6) 








Number of blocks equal to or 
Length of block, shorter than this length 
in seconds 





Observed Theoretical 
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WILLARD PHILLIPS, A PREDECESSOR OF PAASCHE 
IN INDEX NUMBER FORMULATION 


Roy W. JasTRAM 
University of California (Berkeley) 


T Is often supposed that the United States offered no noteworthy 

writings on the theory of value until the appearance in 1858 of 
Henry C. Carey’s The Principles of Social Science. And the reproduc- 
tion-cost theory of value contained therein is of more interest to the 
student of the history of economic doctrine as a curiosity than as a 
helpful contribution to the development of value theory. However, ina 
relatively neglected work by Willard Phillips,! antedating Carey’s 
Principles by thirty years, is found a contribution to the theory of 
value as significant as it is different from the views on the subject held 
by the majority of the writers of the period. 

Willard Phillips’ Manual of Political Economy was published in Bos- 
ton in 1828—ten years after Ricardo’s Principles, seven years after 
Malthus’ Principles and six years before Senior’s article on “Political 
Economy” in the Encyclopaedia Metropolitana. It appeared, then, at a 
time when the labor theory of value held sway among the political 
economists of England, and only the demurring voice of Malthus could 
be heard above the vigorous assertations of Torrens, McCulloch, and 
James Mill in their efforts to state that theory of value in a more un- 
qualified form than did Ricardo. 

It is not my intent here to discuss the theory of value put forth by 
Phillips,? I intend, rather, to call attention to a proposal which Phillips 
made for the construction of index numbers. 

In Chapter II of his Manual of Political Economy Willard Phillips 
enters upon the search for a measure of value in “all times and places” 
with the words: 





1 The Dictionary of American Biography (Scribners, New York) furnishes the following information: 

Willard Phillips, lawyer, author, was born in Bridgewater, Massachusetts, in 1784. He was gradu- 
ated from Harvard in 1810, and was a tutor in that University from 1811 to 1815. During this period he 
studied law. 

Phillips commenced his practice of law in Boston in 1818. During the period 1825-1826 he served 
as a member of the legislature. In 1839 he was appointed probate judge for Suffolk County. This was the 
post he left in 1847 to accept the presidency of the New England Mutual Life Insurance Company. 

Phillips was a member of the American Academy of Arts and Sciences. He was the author of these 
works: A Treatise on the Law of Insurance (two volumes, appearing, respectively, in 1823 and 1834), 
Manual of Political Economy (1828), The Inventor's Guide (1837), The Law of Patents for Inventions 
(1837), and Propositions Concerniny Protection and Free Trade (1850). 

2 It bears similarities to Malthus’ doctrine and contains no taint of the labor theory. Phillips was 
aware that much of what he had to say was either new or controversial, for in the preface to the Manual 
he wrote, “The work is not proposed as embracing a system of doctrines all of which are already ratitied 
in the general opinion... .” 
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The exchangeable value of things is estimated in the currency of the place, 
and the prices current afford a scale of the comparative value of all commodities 
in the same place or market at the same time; the currency is therefore a good 
measure, or, if you please, standard, of value, for the same time and place; as in 
measuring lengths and quantities, though the instrument used be liable to con- 
tract and expand by different degrees of temperature and moisture, it is a good 
standard of comparison between things to which it is applied at the same time. 
But a standard or measure of value for different times and places, is a desidera- 
tum; and, from the nature of the case, always must be.* 


He then proceeds to examine, and to discard in turn, the precious 
metals, labor, wheat, and a mean between a quantity of labor and a 
quantity of corn as a measure of value. This last measure, however, 
evidently is suggestive to him, for he follows his consideration of it with 
his first positive contribution. 


The real value of any article depends upon the quantity of vendible com- 
modities in general, for which it may be exchanged. From an estimate of its 
value in any one or two of these, some inference may be made as to its value in 
the others, and this inference will be the more satisfactory in proportion as the 
thing or the different things with which it is compared, constitute a greater part 
of the whole value in the market, and like labour and wheat, have an influence in 
fixing the value of other things. The greater the number of comparisons made, the 
more satisfactory will be the inferences. 


This leads Phillips to consider “the construction of a table of the 
prices of certain quantities of all the principal subjects of purchase or 
reward at any one time or place; such as labor, articles of food and 
clothing; and a similar table for another time or place in reference to 
which the comparison is to be made.” He becomes particularly keen as 
he continues: 


Suppose a table of this sort to have been made ten years since, when prices 
were higher, making the whole amount of the prices of the assumed quantities, 
a thousand dollars, and that the aggregate amount of the prices of the same 
quantities should now be but three quarters of that sum. If we find that the 
money price of wheat has fallen twenty-five per cent during this period, we shall 
infer that the real exchangeable value of wheat is the same as it was ten years 
ago; but if its price remains the same, it has increased in value one third. This is 
the only general standard, or measure of value which could be of any great utility, 
or the ground of satisfactory inferences. The construction of such a table would 
doubtless be a work of great labour and skill. It would evidently be necessary to 
take different values of different things. Salt, for instance, is of as universal use 
as bread, though much less of it, in value as well as quantity, is consumed. A 
table, therefore, the result of which should be affected equally by the articles 
of salt and bread, or in which equal values of the two should be included, would 
not be a just measure of value. The quantities of the different articles assumed 





3 All quotations are from pp. 14-28 of the Manual of Political Economy. I am indebted to the John 
Crerar Library of Chicago for making a copy of the Manual available to me. 
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ought to be in the proportion of the consumption or the amount possessed in the 
country or district for which the measure is framed. The difference of the posses. 
sions and consumption, in different places, and the change of habits in the same 
place, present difficulties in constructing such a table. For instance, the substitu- 
tion of cotton for many uses to which linen was formerly applied, would now 
give cotton more space in a table constructed for the present time, than it would 
occupy in one constructed in reference to a former period. Without changing 
the amounts of articles to correspond to the differences of consumption, the table 
would not be a fair representation of the values used and consumed; and chang- 
ing the amounts renders the measure less satisfactory in application to any one 
article of which it is proposed to estimate the value at different times and places. 
An approximation to an accurate measure is all that seems to be practicable. 


Now this is a remarkable paragraph. In it Phillips is proposing to 
measure variations of general exchange value by the index number 
formula 2p19:/2poq1. This is commonly known as Paasche’s formula; 
its invention in 1874 has been credited to H. Paasche.‘ Since Willard 
Phillips clearly described the method in 1828, it appears appropriate 
to suggest that Phillips be given credit for its invention. 

Further, so far as I am aware, G. P. Scrope is recognized as the first to 
suggest that actual prices be multipled by selected quantities to obtain 
aggregates, variations of which to be taken as measures of variation of 
general exchange value. Phillips’ proposal antedates that made by 
Scrope; the latter’s Principles of Political Economy was published in 
1833.5 

It would appear proper, therefore, to acknowledge Willard Phillips 
as the originator of the aggregative type of weighted index number, in 
addition to crediting him with the formulation of a specific measure 
within this general type. 





4C. M. Walsh, The Measurement of General Exchange—Value, Macmillan, 1901, p. 541. 
® See Walsh, op. cit., p. 555. 
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BOOK REVIEWS 


Some Theory of Sampling. William Edwards Deming (Adviser in Sampling, 
Bureau of the Budget, Washington; Adjunct Professor of Statistics, Graduate 
School of Business Administration, New York University). New York: John 
Wiley & Sons, 1950. Pp. xvii, 602. $9.00. 


Lester R. FRANKEL, Alfred Politz Research, Inc., 


URING the past two decades, the increased use of statistical methods in 
D solving problems of government, industry, and commerce has been due 
mainly to the development of survey sampling procedures. Much of the 
theory and techniques applicable to this phase of statistics has been pub- 
lished in scientific journals; other techniques in use have appeared in techni- 
cal appendixes of research reports. But a great deal of the theory of sampling 
has not been published at all, and because of the importance of statistical 
sampling and the dispersion of information on sampling, there has been for 
some years a growing demand for a full-length book dealing with the subject. 
This book satisfies the demand. 

The author’s aims are “to teach some theory of sampling as met in large- 
scale surveys ... and to develop in the student some power and desire for 
originality in dealing with problems of sampling.” In successfully achieving 
these aims, the author draws on his vast experience in government, in indus- 
try, as a consultant in marketing and as a teacher of statistics. 

The book has five parts. The first four deal with existing applications of 
the theory of probability to problems of sampling. The fifth part is devoted 
to advanced statistical theory and is intended to provide enough background 
and knowledge to enable students to develop new techniques in sampling. 
Part I deals with the preliminary problems arising in connection with a large- 
scale survey. [t points out that all operations involved in the execution of a 
survey should be considered together, and that sampling techniques as such 
cannot be separated. In all surveys, we are concerned with various types of 
errors and biases that may occur. Bias in a survey may enter not only from 
faulty sample design but also from non-statistical sources. It is pointed out 
that in some cases it may be desirable, in planning a survey, to allow an in- 
crease in the sampling error in order to make possible the reduction of these 
non-statistical errors. The goal is to minimize the total, over-all error. Part I 
clearly reflects the author’s background and experience in survey and sam- 
pling procedures. This reviewer recalls that some 10 to 15 years ago, when 
sampling methods were first used to any extent in this country, many surveys 
did yield incorrect results, even though the sampling procedures were correct. 
These failures were often laid to sampling because, at the time, the existence 
of non-statistical errors was not considered. 

Part II deals with the design of sample surveys after the initial specifica- 
tions have been set. In the entire discussion, the aim is to design a sample, 
applicable under certain administrative conditions, that will yield the small- 
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est possible sampling error. The theory is first developed through a discussion 
of moments and expected values. Then the reader is introduced to problems 
of random sampling, both from the procedural aspect in the selection of 
random samples and from the statistical aspect in the derivation of variances 
of statistics derived from random samples. While unrestricted random sam- 
pling provides the simplest theory, it is seldom used because of many adminis- 
trative limitations and also because more efficient types of sampling may be 
employed. Multi-stage sampling, where the final selection of sampling units is 
obtained after two or more stages of sampling; ratio estimates; and cluster 
sampling are then discussed. In connection with cluster sampling, the close 
connection between theory and practice is clearly visible. The author de- 
velops the formula for the variance of a cluster sample, analyzes the various 
components, and suggests practical procedures designed to minimize the 
variance. In the following chapter, titled “Allocations in Stratified Sampling,” 
proportionate, Neyman, and other types of allocation under various cost 
conditions are investigated. It is pointed out that the various sampling pro- 
cedures discussed previously may be applied within the framework of strati- 
fied sampling. 

The discussion thus far in the book has centered around the use of sam- 
pling techniques to obtain an estimate of the results that would have been ob- 
tained had the same enumerative procedure been applied to every member 
of the universe. This type of study is known as an enumerative study and is 
usually associated with the concept of a finite population. In the next 
chapter, a distinction is drawn between this type of study and an analytic 
study, where interest is in the underlying cause system and where even a 
complete census may be regarded as a sample. The final chapter in this 
section makes use of this concept in discussing acceptance sampling, that 
phase of sampling which is a form of quality control. 

The appraisal of survey results after the data have been collected and 
processed is discussed in Part III. The first chapter is concerned with the 
general problem of statistical inference. It emphasizes that the results of a 
sample survey provide a basis for action; in interpreting the findings, one 
must take into account the statistical problems of estimation and tests of 
hypothesis. The second chapter in this part describes methods of obtaining 
sampling errors of statistics derived from the survey from the actual obser- 
vations themselves. In the design of the sample (Part II), the statistician 
usually does not know the exact values of the various components of the 
statistical error. He draws on his experience and upon related data to obtain 
indications, and then proceeds to design a sample which he believes will 
yield minimum variance of the statistics being estimated. After the survey 
has been completed, procedures are available to obtain unbiased estimates of 
error. 

Two applications of the material covered in the preceding chapters are 
given in Part IV. The first application is concerned with a sample survey of 
tire distributors in order to obtain an estimate of tire inventories. The second 
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application deals with a population sample for Greece. Both of these applica- 
tions appear almost verbatim in the 1946 and 1949 volumes of this Journal. 
It might perhaps have been more useful to shorten these two examples and 
add one or two others. Both of these surveys involved huge operations con- 
ducted on a nationwide scale. In addition, the designs used in both samples 
were conditioned by the fact that the government agencies responsible for 
the execution of the surveys had access to records and facilities not available 
to private practitioners. Statisticians in business would have appreciated a 
description of a commercial sample survey. 

Despite the author’s statement, in the Preface, that the book is not a text 
on mathematical statistics, Part V may be considered one. The author has 
assembled in this section a great deal of the statistical theory on such topics 
as the binomial and related distributions, the Beta and Gamma functions, 
and distribution theory. 

The entire treatment of the subject matter in this book is on a high, pro- 
fessional level. The reader is assumed to have a knowledge of elementary 
statistical methods, college algebra, and some calculus. However, students 
who rely on this minimum requirement may have some difficulty in certain 
sections of the book. The author has made clever use of the devices of “exer- 
cises” and “remarks” to discuss and elaborate pertinent topics and sidelights 
as they appear, without interrupting the continuity of the text. These 
“exercises” and “remarks” are extremely important and a great deal of the 
theory of sampling will be missed by the reader who decides to skip them. 
The information in this book is so extensive that the presentation may 
appear bewildering at first glance. But once the reader gets acquainted with 
its contents, it becomes clear that the topics are developed logically and 
systematically. It seems likely that for some time to come this book will be 
the “bible” of sampling statisticians. 


An Introduction to the Theory of Statistics. G. Udny Yule (formerly Reader in 
Statistics, University of Cambridge) and M.G. Kendall (Professor of Statistics, 
University of London). New York: Hafner Publishing Company, 1950. Pp. xxiv, 
701. $7.50. 


_ Preface states that “this fourteenth edition is again a substantial 
revision. .. . Most of the alterations are additions, but the treatment of 
the theory of attributes, which in earlier editions occupied five chapters, has 
been condensed into three to make room for new material. 

“The major additions fall into two groups. Chapters 21-23 expand the 
former treatment of small-sample theory and give an introduction to the 
practical problems of samplings. Chapters 25-27 give an account of index- 
numbers and the elementary theory of time-series. Chapter 13 on practical 
problems of correlation has also been re-written. Additions have been made 
in the remaining chapters to keep the treatment abreast of new discoveries, 
some of the examples have been modernised and some further exercises 
added. The list of references has been omitted because a much more exten- 
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sive bibliography has now appeared in volume 2 of [Kendall’s] Advanced 
Theory of Statistics.” 


Statistical Decision Functions. Abraham Wald (Professor of Mathematical Sta- 
tistics, Columbia University). New York: John Wiley & Sons, Inc., 1950. Pp. ix, 
179. $5.00. 


See the article by L. J. Savage, pp. 55-67 in this issue. 


Economic Statistics. Lewis E. Maverick (Professor of Economics, Southern IIli- 
nois University). Published by the author at Carbondale, Illinois: 1949. Pp. 
viii, 171. $3.00. 


Puiuip G. Fox, University of Wisconsin 


, book purports to deal with a “First Category of Distributions” (Fre- 
quency Distributions) and with a “Second Category of Distributions” 
(Time Series). 

The treatment of frequency distributions is inadequate and frequently 
misleading. A good many of the explanations are fairly obscure and some of 
them are definitely wrong. The uninitiated, for whom the book would appear 
to be intended, would be confused by unconventional substitution of terms 
such as “standard dispersion” for the universally used “standard deviation.” 
More experienced scholars who have some knowledge of the mathematical 
characteristics of their measures will be surprised to find that the standard 
deviation is “usually” calculated from the arithmetic mean as a central 
point. Such basic concepts as discreteness and continuity, the mode, the 
geometric mean, the coefficient of variation, and sampling, are left in a 
baffling condition. Much of the discussion on correlation consists of brief 
notes rather than exposition. 

The second part of the book ends with an interesting free-wheeling method 
of smoothing extended time series. This is apparently a condensation of Mr. 
Maverick’s earlier book on Time Series Analysis. The rest of the section 
treats the most elementary methods of secular trend fitting, seasonal meas- 
urement, and index numbers, with illustrations which are not too clear. The 
book needs a good deal of development before it will be an adequate text. 


The Role of Measurement in Economics. Richard Stone (Director of the Depart- 
ment of Applied Economics and Fellow of King’s College, University of Cam- 
bridge). The Newmarch Lectures, 1948-1949, given at University College, Lon- 
don. Department of Applied Economics Monograph No. 3. London, England: 
Cambridge University Press, 1950. Pp. viii, 85. $2.50. 


Wa ter D. Fisuer, El Cerrito, California 


7 publication of four lectures is essentially a brief exposition of the 
econometric approach with illustrations from empirical work, chiefly 
from the author’s own work on market demand. 
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The first two lectures (Sections 1 through 13) give a general discussion of 
econometrics. Stone considers this word to have a broad meaning, including 
the tasks of collecting facts and forming “empirical constructs,” such as 
index numbers, as well as testing economic hypotheses, estimating param- 
eters, and making predictions. Problems of estimation and prediction are 
approached by means of simultaneous equations; emphasis is given to lagged 
variables and the auto-regressive form. Numerieal examples of hypothetical 
non-stochastic models show how government by altering the model can 
change the time-paths of the variables, e.g. stop oscillations. The assumption 
of exact knowledge underlying these examples points up the practical diffi- 
culties of this approach; in the author’s opinion the econometrician will need 
advice from “practical” economists on probable magnitudes of exogenous 
variables before he can make predictions. 

The third lecture (Sections 14 through 21) deals briefly with “the problems 
of definition, measurement, and collection that arise in the field of national 
income and expenditure, or, as I think it is better called, social accounting” 
(p. 38). Discussion is mainly devoted to definitions, formal properties, and 
analogies with accounting. Official statistics of the British economy in 1948 
are inserted as an example. The lecture ends with a plea for redefinitions of 
official data and increased use of sampling design. 

The last lecture (Sections 22 through 28) is a treatment of demand analysis 
and a presentation of results of the author’s own empirical market demand 
studies on 13 British and 30 United States commodities. Most of these find- 
ings are evidently published here for the first time, although methods, sources 
of British data, and some preliminary results appeared in previous articles 
by the author in 1945 and 1948 (cited on p. 71). The approach here is con- 
ventional, exclusively single-equation, with subsets of variables selected ac- 
cording to the Frisch bunch-map technique. Constant-elasticity type equa- 
tions are fitted to national annual averages during the interwar period—a 
series of 19 years for Great Britain and 12 for the United States. 

It is clear from this book that the author is genuinely interested in both 
abstract theory and empirical work. Yet there is an unevenness in treatment; 
the sections containing empirical illustrations do not illustrate many of the 
theoretical points raised. For example, too much emphasis is put on simul- 
taneous equation models in the earlier sections in view of the fact that they 
are not used at all in the examples on estimation of parameters. The use of a 
single-equation approach may be justifiable here, but one is not sure. No 
study is yet available which fully assesses the gains from using the methods 
of estimation developed by the Cowles Commission, or which weighs the 
gain against the increased cumbersomeness and cost of computation. All we 
know is that, in general, there is some gain. It would have been of interest, 
however, for the author to have used simultaneous equations on a few of his 
commodities and observed the results. 

The magnitude of the author’s accomplishment, as evidenced by the 
empirical studies, is considerable. Certainly these are the most extensive 
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studies published of demand for individual commodities and commodity 
groups since the appearance of Henry Schultz’s book in 1938. We have 
everything from food and clothing to luggage and pleasure craft. The great 
majority of the numerical findings are plausible on a priori grounds; for 
example, the elasticities for the two latter groups mentioned are higher than 
for the former two. It is to the author’s credit that the implausible findings 
are published as well as the piausible ones—in a rather complete listing of 
144 regression analyses, including alternative subsets. (There seems to be 
some doubt as to the exact value of the price elasticity for monuments and 
tombstones, but it is given as near —1). 

A question that comes quickly to the mind of the reader is the significance 
of the findings in view of the small sample size. With a set of four variables 
fitted to the United States data and a liberal interpretation of independence 
of observations, the number of degrees of freedom is 7. The author has pro- 
tected himself with qualifying statements and claims that crude and tenta- 
tive conclusions may still be of help in answering practical questions. 
Granted—provided one does not confuse such conclusions with established 
facts and continues to seek more information. 

A noteworthy point of statistical merit is the attention given to serial 
correlation, with attempts to test hypotheses that residuals are random over 
time; however, the use made for this purpose of the ratio of mean square 
successive difference to variance is sketchy. A methodological defect is the 
rather cavalier and rigid treatment of “all other prices”—in most cases asa 
single index number. Some of the effects attributed to this variable may be 
really effects of income. 

One minor theoretical point may be noted. On page 24 the criterion stated 
for identifiability of the parameters of a single equation is incomplete. The 
condition that two variables do not appear in one of the equations listed is 
not of itself sufficient for identification. What T. Koopmans calls the rank 
condition is omitted. In the particular example given, this condition happens 
also to be satisfied, but this will not always be so. 

The author’s basic point of view is eloquently summed up in the last few 
pages of the book. He appeals for more understanding between statistical 
and other economists, for better statistics, and for the continued develop- 
ment of economics into a science. 


Moe’s Principle: An Econometric Investigation Intended as an Aid in Dimen- 
sioning and Managing Telephone Plant. Arne Jensen. Copenhagen: The Copen- 
hagen Telephone Company, 1950. Pp. 164, Paper. 


THORNTON C. Fry, Bell Telephone Laboratories 


HIS book is concerned with apportioning money most effectively among 
different productive factors (machines of various kinds, labor, etc.) so as 
to maximize a desired objective. In the telephone case the “productive factors” 
would be the various kinds of apparatus in a telephone exchange, the network 
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of wires which connect the exchanges to the subscribers and to each other, 
the operators, etc. But the problem is approached in such general terms as 
to include a wide variety of industrial problems outside the telephone field. 
For example, the “desired objective” could be any one of a number of things. 
It could be to get the greatest volume of profits, or the greatest percentage 
of profits. In the telephone industry, it would typically be to stimulate the 
largest use consistent with a given percentage return. 

The theory is derived by conventional methods of the calculus of varia- 
tions. The text is brief; less than 30 pages. The remaining 140 pages are 
devoted to numerical tables. Three of these are basic functions which arose 
in Erlang’s study of the fluctuations of random traffic. The other three 
tables contain linear combinations of these functions which arise in Moe’s 
theory when applied to telephony. These three would not be required in 
many applications of Moe’s theory. They arise only where, as in telephony, 
some of the productive factors are related to random fluctuations in demand. 
But conversely, the first three are of prime interest in dealing with such fluctua- 
tions, whether or not Moe’s theory is involved, and since the tables are the 
most extensive of which I have any knowledge, they should be of great value 
to all who have to deal with such matters. This field is wider than is generally 
realized. It would include, for example, such diverse subjects as building 
codes for water supply and waste pipes in large buildings, reordering sched- 
ules for stockroom management, or how the number of spindles that can 
profitably be attended by one workman varies with capital and labor costs. 

There is no question as to the value of the book; the group of three basic 
tables is quite sufficient to establish that. As to the practical value of the 
theory, I am not so certain. Functions are usually flat in the neighborhood of 
optimum values, sometimes surprisingly so. Hence, the productive factors 
need not be determined with high accuracy. Also, before the theory can be 
applied, one must state the objective toward which the industry is working— 
whether maximum profit, maximum service, or a compromise between the 
two. This introduces a judgment factor. The practical necessity of building 
reserves for future growth also requires judgment, and in addition may not be 
easy to incorporate in a theoretical study. One therefore is led to ask how 
much benefit to expect from judgment guided by the theory, as compared 
with the direct application of judgment to the productive factors themselves. 
It is to be regretted that the book does not discuss the uses to which the 
Copenhagen Telephone Company has put the theory, and the benefits de- 
rived from doing so. 

The publisher is the Copenhagen Telephone Company. For half a century 
its name has been associated with outstanding mathematicians and out- 
standing mathematical projects. Early in this century J. L. V. W. Jensen, 
who held a high executive position in the company, was internationally 
known as a pure mathematician. A. K. Erlang, who did outstanding work on 
the application of probability theory to traffic problems, is not so well known 
to pure mathematicians, but is recognized in telephone circles as a foremost— 
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perhaps the foremost—worker in his field. K. Moe, late Chief Engineer of the 
company, occupied himself with the mathematical problem of industrial 
management which led to the book under review. And Arne Jensen, the 
present Actuary of the company, has devoted himself to presenting a con- 
nected account of Moe’s theory, and to implementing it with adequate 
numerical tables. Finally, the Company has provided for the publication, 
not only of the present book, but also of the collected works of Erlang which 
appeared in 1948. This is a proud record for a company of so modest size. 


Introduction to Business Cycles. Ascher Achinstein (Senior Specialist, Legislative 
Reference Service, Library of Congress). New York, Thomas Y. Crowell Com- 
pany. 1950. Pp. xvi, 496. 


RUTLEDGE VINING, University of Virginia 


¥ its general form, this is a rather conventionally organized textbook for a 
course in business cycles. Part I, making up the first third, is devoted to 
“Business Cycle Theories.” Part II, covering the next third, presents the 
“Empirical Aspects of Business Cycles.” The last third, Part III, considers 
“Secular Trends and Cyclical Fluctuations” by reviewing the opinions and 
theories of the prominent discussers of the many topics that may fall under 
this heading. 

For my particular preferences there is a disproportionate amount of space 
devoted to the opinions of men regarding the “causes” of business cycles, 
and the views of too many men are examined too briefly. If the book is in- 
tended for an undergraduate course, I am inclined to think that the students 
will get little out of this part of the discussion except memorized opinions of 
certain men about certain other men’s works; and I believe that most in- 
structors will find much to quarrel with in the brief presentations and critical 
reviews of the various theories. This instructor, for example, feels that there 
are important methodological confusions involved in the many critical re- 
marks made with respect to the limitations of equilibrium economics. 

The distinctive feature about the book is that it represents an effort to 
present the material from the point of view of Wesley Mitchell, and I believe 
that those who know Mitchell’s work most intimately regard this as a re- 
liable description of Mitchell’s procedure and research point of view. Almost 
the whole of Part II consists of a description of the procedures used by the 
National Bureau in analyzing business cycles. Chapter 14 deals with “The 
Problem of Defining and Identifying Business Cycles”; Chapter 15 with 
“Measuring Cycles with the Aid of Business Annals and Business Indexes” ; 
Chapter 16 with “Statistical Methods of the National Bureau of Economic 
Research” ; Chapter 17 with “National Income and Business Cycles” ; Chapter 
18 with “Patterns of Cyclical Fluctuation”; Chapters 19 and 20 with “Busi- 
ness Cycles in the United States, 1919-1938.” All of it fits into the general 
scheme of Mitchell’s published work. 

The criticism that I would have to offer would be principally a matter of 
pedagogy. I readily agree with Achinstein that Mitchell was an inspiring 
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man whose work deserves special attention in the curricula of departments 
of economics. But I do not believe that his important contribution will be 
passed on to students in a conventional course in business cycles. Mitchell, 
as E. B. Wilson has remarked, was essentially a great naturalist who col- 
lected, classified, and probed tirelessly to see what his subject matter could 
teach him. He was ever engaged in a vast game with Nature, and like all 
exceptional inquirers he possessed a talent for and experienced a keen delight 
in sheer problem-solving where methods must be invented or adapted as the 
work proceeds. It is the taste and talent for problem solving that should be 
developed in students, and training for problem-solving in Mitchell’s field 
involves more than a textbook course in business cycles. 

Mitchell, I think, conceived of himself as studying a living “thing.” He 
is known as a measurer of business cycles, and the business cycle is spoken of 
as the “unit” of Mitchell’s study. But I believe that Mitchell looked upon 
the business cycle as only an important apparent behavior characteristic of 
the basic entity that he kept under observation. From this point of view, the 
actual arithmetic procedures adopted for measuring various attributes of 
the business cycle are interesting but secondary. As his ideas developed re- 
garding how the “thing” that he was studying was put together and how it 
functioned, he felt around for ways of arranging and comparing “new” as- 
pects of the data; and presumably if he had it all to do again, the details of 
his methods would have been different from those that he happened to hit 
upon. The primary idea is not the method of measurement but rather the 
nature of the “thing” whose behavior is being observed. 

Mitchell, it seems to me, held as the object of his inquiries the structure 
and functioning of what he saw as a vast social organism—the integrated 
economic system. He was, of course, much influenced by Veblen, and the 
idea basic to his work is the conception emphasized by Veblen of an economic 
system as a concatenation or articulation of industrial processes into a unified 
and balanced mechanical system of operations. Organization and structure are 
fundamental elements of the idea. An immensely complex division of labor 
and dovetailing of individually determined activities exist, and a system of 
concrete flows is a feature of the structure consisting of a population con- 
figuration of interdependently acting and reacting units. 

The economic system, of course, is not a physical flow system. But it has 
features in common with such a system. When these features are made ex- 
plicit, and when appropriate arrangements of population data, traffic phe- 
nomena, and economic flow data are made, there are at least suggestions of 
probabilistic models that have been found widely applicable in a considerable 
number of different fields of investigation. In order that problems of eco- 
nomic variation may be formulated in such a way that this theory may be 
brought to bear for any use that may be found in it, knowledge of the sort 
sought by Mitchell is necessary. The analysis of a situation involving prob- 
ability considerations begins with a specification of the conditions under 
which the variation comes about. In his formulation of the problem (whether 
it be a game of chance or a situation observed in nature), the analyst con- 
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ceives of himself as confronted with an idealized experiment, capable of being 
repeated, such that a sequence of “results” may be recorded. It may be 
diffusion phenomena, traffic phenomena, aspects of migration, inheritance, 
communications, or what not. Specification of the conditions of the “experi- 
ment” requires knowledge regarding the essential features having a bearing 
upon the “results.” There is said to be an art in probabilistic formulation— 
an intuitive insight, capable of development by training, that may see in a 
complex situation the essential attributes of a relatively simple probability 
model that brings what appears as chaotic variation into some sensible order. 
What may appear to untutored common sense as essential features may turn 
out to be not essential, and of course conversely. But before a situation can 
be analyzed, that situation must be specified. 

This working toward the specification of the conditions of variation ob- 
servable in the structure and functioning of an economic system is, to me, the 
important contribution of Mitchell and the research point of view that he 
represented. But, again, building upon or adding to the work of such a man 
consists not in merely doing mechanically with figures what he found it 
expedient to do with them and accumulating more and more of what he, in 
his probing and searching, found occasion to accumulate. To build upon his 
work, is to proceed from his point of view, “seeing” the outlines of the 
“thing” that he was observing, and developing methods of measurement and 
description and glimpsing aspects of the phenomena that one might at least 
imagine Mitchell to have discerned and developed. 

From this point of view, the task in teaching is that of devising a curricu- 
lum that will most effectively assist students to acquire a knowledge of what- 
ever quasi-permanencies and uniformities may be discovered in the structure 
and functioning of an economic system, and that will develop in them the 
intuition apparently necessary for at least moderate success in pure problem- 
solving. 

Thus, my criticism is really not of Mr. Achinstein’s book. I feel fairly 
confident that men for whose judgment I have a very high regard would 
consider the book a rather good text and a reliable treatment of the National 
Bureau techniques. My criticism is of the type of college courses for which 
the book is designed, for I believe that curricula in departments of economics 
are in as serious a need for overhauling as curricula in statistics are said to be. 


Business Cycles in Selected Industrial Areas. Philip Neff and Annette Weifen- 
bach (John Randolph Haynes and Dora Haynes Foundation). Berkeley and Los 
Angeles: University of California Press. 1949. Pp. xiii, 274. $4.00. 


Joun H. Cover, University of Maryland 


et statistical series (bank debits, department store sales, industrial and 
commercial power sales, and. industrial employment) in six cities (Chi- 
cago, Cleveland, Detroit, Los Angeles, Pittsburgh, and San Francisco) pro- 
vide the basic material of this book. Its objectives were “to measure the 
similarities or differences in timing, duration, and pattern of business cycles in 
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the six industrial areas. . . . The methods employed are the usual procedures 
for time-series analysis presented in textbooks and in the cycle studies of 
the National Bureau of Economic Research. 

The authors faced the usual problems of wide variations in areas, in con- 
stituent statistical series, and in time-periods of available data. For instance, 
power sales in Chicago represent “indexes of ‘Large Power Sales’ in the 
Chicago area (6,000 square miles in northern Illinois . . .”; “The Cleveland 
series actually is ‘Commercial and Industrial Power Sales’ but refers to the 
Cleveland Metropolitan District . ..”; “The power sales series for Detroit 
and Pittsburgh represent industrial power sales only . . .”; “The power sales 
series in the San Francisco area provides the least satisfactory comparison 
with the other areas...” (pp. 197-8). These limitations of comparability 
are frankly recorded by the authors, but analysis proceeds as though con- 
fessions of incomparability granted dispensation. 

Considerable space in the opening chapters is given to sketching business 
cycle theories including sequential relationships, and subsequently a number 
of these propositions become reference criteria for the behavior of the local 
series. A regrettable error occurs when the text assumes that its conclusions 
with respect to its “industrial areas” are extendable to “regions” as employed 
in a number of other studies, and that its limited statistical series provide 
a test of “regional” lead and lag. Again, the statement that the “whole field 
of regional economic studies has been neglected” suggests that the activities 
of almost fifty university bureaus of economic and business research, a 
number working in this field for thirty years, have been neglected. 


Cyclical Diversities in the Fortunes of Industrial Corporations. Thor Huligren. 
Occasional paper 32. New York: National Bureau of Economic Research, Inc., 
1950. Pp. v, 29. $.50. 


J. F. Weston, University of California (Los Angeles) 


~ is a highly competent beginning of analysis of the behavior of corpo- 
rate net incomes during business fluctuations. During a business upswing, 
the number of companies with increasing net incomes rises and reaches its 
peak before the upper turning point. The decrease in the number of com- 
panies with improving net revenues continues into the following contraction, 
but reaches its trough before the lower turning point. Peaks in the net re- 
ceipts of individual companies were much more common near a reference 
cycle peak than near a reference trough. Minimum levels of net incomes 
tended to concentrate near reference low points. There was considerable dis- 
persion about these general tendencies. For example, during the quarter in 
which the smallest number of profit increases took place during the 1929-37 
depression, 26 per cent of the ccrporations studied had rising profits. In the 
quarter of the 1920’s during which the greatest number of corporations ex- 
perienced increases in earnings, 23 per cent had declining net incomes. 

The pattern of aggregate net incomes of the companies in the samples 
closely paralleled that of the reference cycle. Aggregate net incomes, how- 
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ever, presented no consistent lead or lag with respect to the reference cycle. 
The turning point in aggregate net incomes coincided with the reference 
turn five times, preceded it by one quarter twice, by two quarters twice, and 
lagged by one quarter once. Hultgren’s sample permitted only broad indus- 
try classifications. He found that the net income behavior for producers of 
durable goods was not consistently different from the pattern found for 
producers of non-durable goods. 

The use of net income data to forecast the reference cycle is considered. 
Hultgren examines the use of changes in the number of companies experienc- 
ing rising revenues as a device for forecasting a change in the general level 
of business activities. Because of the varying leads, the forecast could only 
say that the change in the general level of business activity would take place 
sometime during a range of the following twelve months. Application of the 
device during the interwar period would have yielded correct results in six 
of nine instances. The analyst would have made his forecast two quarters 
after the turn in the number of companies had taken place. 

The data tend to confirm W. C. Mitchell’s hypotheses on the behavior of 
profit margins during business cycles. However, Hultgren observes that the 
variations in net incomes could also be produced by variations in volume 
while unit margins remained constant. He suggests that new investment in a 
business expansion may be concentrated in enterprises whose net incomes 
begin to decline. The consequent decrease in additions to assets by these 
firms may cause aggregate investment to decline while aggregate net receipts 
continue for a time to increase. 

Hultgren’s analysis is based on quarterly net income data for individual 
companies compiled by the National City Bank of New York, supplemented 
by additional data collected by Harold Barger. The data analyzed were 
adjusted for seasonal influences. Since his analysis was necessarily confined 
to companies for which net income data was available for at least a full cycle, 
the number of companies in his samples varies. His earliest sample contains 
17 companies while his latest includes 244. The smallness of his samples 
prevented detailed analysis by industry. The samples represent in the main 
only the very largest companies in the American economy. This may limit 
somewhat the generality of his findings, but is a limitation imposed by the 
nature of readily obtainable data. 


Urban Mortgage Lending by Life Insurance Companies. R. J. Saulnier (Director 
of the Financial Research Program of the National Bureau of Economic Re- 
search and Professor of Economics at Barnard College, Columbia University). 
National Bureau of Economic Research, Inc., New York 23, New York. 1950. 
Pp. xxi, 180, $2.50. 


A. M. Wermer, Indiana University 


ROFESSOR Saulnier’s book is the first of a series of studies by the National 
Bureau of Economic Research in the field of urban real estate financing. 
Based largely on original data furnished by lite insurance companies, the 
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book adds substantially to the extremely limited library of knowledge on this 
increasingly important subject. 

The first three of the six chapters provide a brief description of the scope 
of urban mortgage lending by life insurance companies, the legal framework, 
and the internal organizational structure. Chapter 4 is a description of the 
urban mortgage market served by life insurance companies, based on a one 
per cent sample of urban mortgage loans made by 24 leading life insurance 
companies over a 27 year period, 1920-46. Chapter 5 contains an analysis 
of mortgage lending costs and returns, based on questionnaires returned by 
life companies for the years 1945, 1946, and 1947. Chapter 6, based primarily 
on the one per cent sample of loans made, is an attempt to reconstruct the 
loan experience of life insurance companies and for that reason is undoubt- 
edly of greatest interest to the statistician. 

The problems involved in any study of mortgage experience need to be 
kept in mind in making an appraisal of Professor Saulnier’s work. In the 
first place, there is no accepted standard as to what is good or bad experience. 
Foreclosure, especially foreclosure with a resultant loss, is definitely a bad 
experience, but one whose incidence is so small that valid conclusions cannot 
be drawn except from a very large number of cases covering a long period of 
time. 

A second difficulty is that many of the more important characteristics of 
mortgages, including some that are believed to be closely related with 
mortgage experience, are not described in any retained records of mortgage 
lenders, or if described are not subject to statistical treatment. For example, 
neighborhood trends, income and occupation of borrowers, quality of con- 
struction and suitability of design undoubtedly have an important effect on 
mortgage experience. These and other variables, however, must usually be 
ignored in any study based on past experience. 

In view of these difficulties, Professor Saulnier’s results certainly must be 
considered satisfactory. Experience over the past thirty years is far from 
conclusive as to what a given life company should do to minimize its own 
foreclosures and losses. The principal conclusion seems to be that during a 
period of severe depression many mortgages will be foreclosed and losses will 
have to be taken. 

Differences in rates of foreclosure and in loss experience are presented for 
mortgages in different geographical areas, or secured by different types of 
property, or with other differences in characteristics. Some of the tables 
leave the reader with the conflicting impressions on the one hand that the 
author has squeezed the data too hard and on the other that he has not fully 
analyzed the material available to him. For example, foreclosure rates are 
published for one classification which contained but one foreclosure out of a 
total of only nine loans. On the other hand, for a group of 2,580 home 
mortgage loans made in metropolitan districts during the critical 1920-29 
period there are no separate foreclosure rates for any property or mortgage 
characteristics. 
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A major contribution of the book is its accurate description of mortgage 
lending practices by insurance companies. For this reason the book is par- 
ticularly useful to students of finance and to their teachers, most of whom 
have had difficulty in finding out much about mortgage lending. 


National Income Statistics of Various Countries, 1938-1948. Statistical Office of 
the United Nations, Lake Success, New York, 1950. Pp. vii, 249. 


7 volume includes statistics which cover thirty-two countries and the 
sources of which are national income studies which have been published 
since the first issue was completed in 1948. 

According to the Preface, “The introductory chapters summarize certain 
conceptual problems arising in defining national income with particular 
emphasis on recent developments. The investigation into the international 
comparability of available national income estimates has been brought up to 
date. The Statistical Office has also continued and refined its attempts to 
adjust the estimates in accordance with a proposed standard definition. 

“The chapter presenting a survey of available national income statistics 
now includes also the social accounts for all countries that have adopted this 
technique. 

“A new chapter has been added on the various sub-classifications of na- 
tional income presented in a way to facilitate international comparisons. The 
synoptic tables in appendix I have been extended to include series of national 
income figures over a longer period than was covered by the corresponding 
tables in the previous volume. 

“The present report includes data received by February 1950.” 


Handbook of Old-Age and Survivors Insurance Statistics. Federal Security 
Agency, Social Security Administration, Baltimore, Maryland. 1950. Pp. 121. 


HIs Handbook is described by its foreword as presenting “data on the 
characteristics of workers in employment covered by old-age and sur- 
vivors insurance in the period 1937-47. The data reflect the cumulative wage 
and employment history of workers during the first 11 years of operation of 
the insurances system and also the experience of workers engaged in covered 
employment during the calendar year 1947. The present Handbook is the 
seventh of a series started in 1939.... 
“The statistics in this Handbook are derived from the wage records main- 
tained by the Division of Accounting Operations.” 


The Social Areas of Los Angeles. Eshref Shevky and Marilyn Williams. Berkeley 
and Los Angeles: University of California Press, 1949. Pp. xvi. 172. $4.00. 


Watter Firey, University of Texas 


HE Social Areas of Los Angeles is a presentation and analysis of statistical 
data concerning selected social characteristics of the Los Angeles popula- 
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tion grouped by geographic areas. Its chief uniqueness probably lies in the 
wealth of descriptive material which it has extracted from the 16th Census 
of the United States, and the interesting statistical constructions which 
have been made from this material. Extensive use has been made of elemen- 
tary statistical measures, such as indices, ratios, percentages, averages and 
correlation coefficients. Well prepared graphs and maps present these meas- 
ures in a clear and interesting manner. 

Chapter I considers “The City within the Framework of Social Trends.” 
In addition to summarizing rather familiar information on nation-wide fer- 
tility, migration and occupational trends the authors have indicated the 
distinctively urban-industrial character of California’s population and the 
special position which Los Angeles occupies in this pattern. 

Chapters II-V develop the methodology of the study. The authors’ princi- 
pal interest lies in delineating the social structure of greater Los Angeles 
insofar as this can be achieved through an analysis of geographically dis- 
tributed indices of social characteristics. Three variables have been selected 
and their values then plotted by census tracts for the metropolitan area. 
The variables and their construction are, briefly, as follows: (1) An index of 
Social Rank, constructed from data on occupation, education and income; 
(2) An index of Urbanization, derived from data on fertility, women in the 
labor force and physical character of neighborhcods; and (3) An index of 
Segregation, derived from the number of persons in isolated ethnic grcups 
relative to the total population. Considerable skill and insight has been 
shown in the construction of these indices. At the same time Shevky and 
Williams have properly emphasized the limitations to which their measures 
are subject. 

In analyzing the relationships between their three variables the authors 
have adopted Lazarsfeld’s concept of “attribute space.” All the census tracts 
of the Los Angeles metropolitan area were placed, according to the values of 
their variables, into a three-dimensional attribute space. In this space the 
range of values for each of the three variables was divided into intervals— 
three for Social Rank and Urbanization, two for Segregation. Then, by 
means of graphs and maps, a number of relationships between these variables 
have been “located” both functionally and geographically. 

The concluding chapter outlines a number of empirical generalizations 
which emerge from the study and which find support in data from other 
cities. At the end of the book are six appendices and two census tract maps. 
These provide the interested reader with the source material from which 
Shevky and Williams have made their constructions and from which further 
analyses can be made. The book must certainly rank as a major contribution 
to the literature in urban sociology and will very likely stimulate further use 
of the statistical and graphic devices which it has so skillfully employed. 
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The Theory of Inbreeding. Ronald A. Fisher (Arthur Balfour Professor of Genetics, Uni- 
versity of Cambridge), Edinburgh, London: Oliver and Boyd, 1949. Pp. viii, 120. 10/6 net. 


Heruvur H. Stranpskov AND BERTRAM L. Hanna, The University of Chicago 


vo systems of mating are possible within sexually reproducing spe- 
cies of plant or animal. They fall naturally into two major categories: 
(1) random mating and (2) non-random or assortative mating. The former 
is the one most closely approximated in most intrabreeding populations. In 
this system, under ideal conditions, a sexually mature individual has an 
equal chance of becoming the mating partner of every mature individual of 
the opposite sex within the population. Non-random or assortative matings 
comprise a group of systems of which the more important are inbreeding and 
outbreeding. These two systems are based on genetic relationship of mating 
partners. The former is a group in which the mating partners are consistently 
more closely related than would be true under a system of random mating. 
Examples are self-fertilization, brother-sister, and cousin matings. Out- 
breeding implies a system in which the mating partners are consistently 
more distantly related than would be true under a system of random mating. 

Since heredity consists of discrete particles called genes, which in sexual 
reproduction follow well established laws, it is possible to examine in precise 
mathematical terms the theoretical consequences of any system of mating 
through successive generations. This has been done by numerous investiga- 
tors not only for random mating but also for all forms of assortative mating. 
The consequence of any system of inbreeding is a gradual increase in homo- 
zygosis, i.e. in genetic purity, within a line as successive generations are bred. 
Since some degree of genetic homogeneity is often the goal of breeders of 
laboratory or domesticated animals or of cultivated plants, mathematical 
analyses of the theoretical consequences of the various systems of inbreeding 
have been given a major share of attention. Notable among the contributions 
to this general field are the pioneering studies of Jennings in 1914, ’16 and 
’17, and since then, those of Robbins, Wright, Haldane, Fisher, Dahlberg and 
Hogben. 

The book which is being reviewed is Fisher’s most recent contribution to 
inbreeding theory. The problem which he sets out to solve is one of determin- 
ing the rate at which homozygosis progresses in successive generations under 
any given system of inbreeding. His approach is to find the relative frequen- 
cies of the different kinds of matings, with respect to genetic composition, 
which are possible in successive generations. He finds that a constant, which 
he calls lambda, may be obtained after several generations, under any sys- 
tem, which gives the relative frequencies of the various kinds of possible 
matings in a given generation in terms of those of the preceding one. He 
defines lambda on page 28 as “the fraction by which each frequency is multi- 
plied so that u:=Auo-- - etc.” Thus lambda is an expression of the rate at 
which progress is being made toward homozygosis. It is not, as Fisher cor- 
rectly emphasizes, an absolute measure of the amount of progress made. 
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This is found by multiplying the number of generations by log 1/A. As 
Fisher states, “This is a practical measure of what has been done. When it 
reaches 2 units a great deal of progress has been made; when it reaches 5 
units homozygosis is nearly complete.” Since the method presented involves 
largely the use of matrix algebra it may appropriately be referred to as the 
generation matrix method. As we shall point out later, other methods exist. 

The major part of the mathematical development of the generation matrix 
method appears in Chapter III. Chapter I is an introduction to the history 
of inbreeding theory. Chapter II is a discussion of methods whereby lines 
may be inbred while still maintaining segregation in one or more pairs of 
genes. This chapter serves as a good background for an understanding of the 
generation matrix method. It closes with a valuable discussion on the use of 
segregating inbred lines in biologic research. Chapter IV, the last chapter, 
discusses briefly the application of the generation matrix method to other 
systems of inbreeding, i.e. other than the one analyzed initially. Three ap- 
pendices appear which deal with such topics as “Model mating systems,” 
“Time criterion for choice of mating,” and “The function of inbreeding in 
animal and plant improvement.” 

In evaluating this latest contribution by Fisher to inbreeding theory we 
find ourselves in possession of opposing opinions. It is true that Fisher pre- 
sents concisely yet clearly a method, involving the use of matrix algebra, 
whereby one may determine the progress made toward homozygosis in suc- 
cessive generations under any system of inbreeding. Nevertheless, it also is 
true that other methods have been developed and that the one presented by 
Fisher is not entirely original or different from these. We mention this fact 
because nowhere in the book does Fisher refer to the contributions of others. 
For the reader unfamiliar with the literature we wish to point out that recur- 
rence equatians, such as those employed by Fisher, were used by Jennings as 
early as 1914 (American Naturalist, vol. 47) and again in 1916 (Genetics, 
vol. 1) in his pioneering analysis of the consequences of inbreeding. Further- 
more, the method presented by Fisher is essentially the same as the one in- 
troduced by Bartlett and Haldane in 1934-35 (Journal of Genetics, vols. 
29-31) and developed further by Haldane in 1937 (Journal of Genetics, vol. 
34) and in later papers. And finally it seems appropriate to emphasize that 
the major end results obtained by Fisher by the generation matrix method 
are the same as those reached by Wright decades ago by the method of path 
coefficients (1921, Genetics, vol. 6, 1931, Genetics, vol. 16, 1933, Proceedings 
of the National Academy of Sciences, vol. 19, 1938, and vol. 24, 1943, etc.). 
Fisher does not cite or refer to these contributions of Jennings, Haldane, and 
Wright. Perhaps one exception exists. On page 43 Fisher states that at- 
tempts to set up coefficients of inbreeding have been or are “unsatisfactory.” 
If he is referring to Wright’s coefficient called F, and he probatly is since 
this is one most widely calculated, and he has reached his conclusion on the 
grounds he presents, his statement falls flat, because the grounds he presents 
do not apply to Wright’s F. In fact, if they did, Fisher would be obliged to 
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consider his own expression “unsatisfactory,” because one may easily show 
that it is identical to log, (1—F). Actually, neither is “unsatisfactory.” 

In conclusion, we should like to state that this book by Fisher on inbreed- 
ing theory should be of interest to the statistician in that it shows excellently 
how mathematics may be used in the solution of an important biologic 
problem, and especially how matrix algebra may be employed. To the 
biologist, the book should be of interest for several reasons. In the first 
place, it discusses clearly and meaningfully a number of topics, which are 
not exclusively a part of the method which is developed, but which neverthe- 
less are important aspects of inbreeding theory. Secondly, the book presents 
a method by means of which one may calculate the progress made toward 
homozygosis in successive generations of inbreeding and, most importantly, 
by means of which one may verify the results reached decades ago by Wright 
by the method of path coefficients. The verification of Wright’s results should 
be reassuring to those who have relied on them for years and it should also 
cause others to become acquainted with the method of path coefficients. It 
may be that some breeders of laboratory or domesticated animals or of culti- 
vated plants, who do not take kindly to correlation methods, may find the 
generation matrix method more to their liking. Yet, this does not seem prob- 
able, because the latter method can hardly be considered easier. Matrix alge- 
bra is not easy for the less highly mathematically trained biologist to follow 
or apply. 

All in all the book is a valuable contribution to inbreeding theory, but in 
our opinion it would have been much more valuable if its author, with his 
keen understanding of biological problems and mastery of statistical tech- 
niques, had compared his method with those of others and had pointed out 
in precise terms in what ways his method is different from or superior to 
those of others. 
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